{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "801307d3",
   "metadata": {},
   "source": [
    "## PDF\n",
    "\n",
    "可移植文档格式 (PDF)，标准化为 ISO 32000，是 Adobe 于 1992 年开发的一种文件格式，用于以独立于应用程序软件、硬件和操作系统的方式呈现文档，包括文本格式和图像。\n",
    "\n",
    "使用 pypdf 将 PDF 加载到文档数组中，其中每个文档包含页面内容和带有 page 编号的元数据。\n",
    "```SHELL\n",
    "pip install pypdf\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "6dd4a609",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import PyPDFLoader\n",
    "\n",
    "loader = PyPDFLoader(\"./pdf/2403.04667.pdf\")\n",
    "pages = loader.load_and_split()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b4f6e0c3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='THE SOCIAL IMPACT OF GENERATIVE AI: A NANALYSIS ON\\nCHATGPT\\nA P REPRINT\\nMaria T. Baldassarre\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nmariateresa.baldassarre@uniba.it\\nDanilo Caivano\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndanilo.caivano@uniba.it\\nBerenice Fernández Nieto\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nberenice.fernandeznieto@uniba.it\\nDomenico Gigante\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndomenico.gigante1@uniba.it\\nAzzurra Ragone\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nazzurra.ragone@uniba.it\\nSeptember, 2023\\nABSTRACT\\nIn recent months, the social impact of Artificial Intelligence (AI) has gained considerable public inter-\\nest, driven by the emergence of Generative AI models, ChatGPT in particular. The rapid development\\nof these models has sparked heated discussions regarding their benefits, limitations, and associated\\nrisks. Generative models hold immense promise across multiple domains, such as healthcare, finance,\\nand education, to cite a few, presenting diverse practical applications. Nevertheless, concerns about\\npotential adverse effects have elicited divergent perspectives, ranging from privacy risks to escalating\\nsocial inequality. This paper adopts a methodology to delve into the societal implications of Genera-\\ntive AI tools, focusing primarily on the case of ChatGPT. It evaluates the potential impact on several\\nsocial sectors and illustrates the findings of a comprehensive literature review of both positive and\\nnegative effects, emerging trends, and areas of opportunity of Generative AI models. This analysis\\naims to facilitate an in-depth discussion by providing insights that can inspire policy, regulation, and\\nresponsible development practices to foster a human-centered AI.\\nKeywords AI Social Impact ·ChatGPT Social Impact ·Human-centered AI ·Perceptions on ChatGPT ·AI Social\\nConcern\\n1 Introduction\\nIn recent months, the social impact of Artificial Intelligence (AI) has been at the forefront of public debate due\\nprimarily to the introduction of new software systems and technologies, specifically ChatGPT. The rapid development\\nof these technologies has, even more, sparked the debate regarding the advantages, limitations, and risks of Artificial\\nIntelligence’s expanding capabilities. From healthcare to cybersecurity, generative models offer a vast array of practical\\nand prosperous future possibilities. However, concerns regarding potential adverse effects present an opposing viewpoint,arXiv:2403.04667v1  [cs.AI]  7 Mar 2024', metadata={'source': './pdf/2403.04667.pdf', 'page': 0}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nwith arguments spanning from privacy risks to deepening social inequalities. Models like ChatGPT personalize the\\ndigital version of the Delphic oracle, where people expect to find answers to their current problems by automating\\ntasks, seeking ChatGPT’s opinion on various issues, and even requesting advice. Nonetheless, it is essential to question\\nwhether we are genuinely resolving uncertainties or uncovering new ones regarding the scope, boundaries, and prospects\\nof generative models’ societal impact. Some other analysts have referred to an \"AI arms race\" in which companies\\nworldwide strive to showcase the best technology, innovation prowess, and leadership in the AI market. On the\\nother side of the debate, discussions refer to the rapid development of these models and the effectiveness of existing\\nlegal frameworks in safeguarding against unintended adverse outcomes. Amongst all these debates, unquestionably,\\nGenerative AI is currently undergoing a period of accelerated evolution. This evolution inevitably brings about a social\\nimpact akin to the ones experienced through numerous other technological advancements that transformed our society\\nin the past. To date, significant effects have been observed in service provision, education, and scientific analysis.\\nHowever, more profound and concerning impacts also unfold in domains like democracy, inequality, security, and\\nmilitary technology.\\nConsequently, a comprehensive examination and analysis are required to understand the positive and negative social\\nconsequences, emerging trends, and areas of improvement of generative models. These studies are needed to address\\npotential vulnerabilities and ensure the development of these technologies considers the diverse social contexts and\\nrealities in which they are deployed.\\nBuilding upon the preceding insights, this analysis adopts a comprehensive approach to explore the societal ramifications\\nand future trajectories of Generative AI, with a specific emphasis on ChatGPT. The analysis is organized as follows:\\nSection 2 examines the potential impacts of ChatGPT across diverse social sectors and the evolution of the debate\\nacross various spheres; Section 3 provides a brief overview of the state-of-the-art of Generative AI models as well\\nas a classification of these; Section 4 presents the study design, goals, and Research Questions, plus the search\\nstrategy. Section 5 includes the data analysis and synthesis, drawing conclusions from the literature review, and sharing\\nvisualizations of our findings. Conclusion and future work close the paper.\\n2 BACKGROUND\\nThroughout history, the advancement of technology has consistently brought about significant transformations in social\\ndynamics. Each new technological breakthrough has sparked debates regarding scientific progress’ advantages and\\npotential hazards. Currently, this discourse encompasses various automated tools, data collection and analysis, and the\\ndigitization of services, among other emerging applications, which have become integral parts of modern-day life.\\nThese novel technological applications pervade numerous societal domains, ranging from education to diplomacy,\\nexerting a profound influence and continuously reshaping various social processes. While there is a prevailing and\\noptimistic belief in the positive impact of technology on human progress [McKendrick, 2019], it is also becoming\\nincreasingly apparent that these disruptive forces could potentially engender unforeseen and unintended consequences.\\nIn informatics, sociology, philosophy, and politics, the development of generative models will continue to ignite\\nin-depth discussions on various subjects. These discussions include topics such as regulation, risk mitigation, liability,\\ntransparency, and accountability, as well as the effects on socialization patterns and the trajectory of technological\\ndevelopment itself.', metadata={'source': './pdf/2403.04667.pdf', 'page': 1}),\n",
       " Document(page_content='transparency, and accountability, as well as the effects on socialization patterns and the trajectory of technological\\ndevelopment itself.\\nIn our case, the significance of examining the social impact of ChatGPT stems from its potential to cause significant\\nsocial transformations, despite ongoing debates regarding the magnitude of these changes. ChatGPT is a powerful\\ngenerative model that may impact power dynamics at multiple scales, from individual interactions to broader social\\nstructures [Farrell et al., 2022]. This dynamic occurs in complex social environments marked by disparities, stereotypes,\\nconflicts, and various political and social organization forms. These diverse social contexts, which surround scientific\\nadvances, trigger unpredictable and immeasurable consequences that fundamentally transform how we interact with\\neach other and the world. When a disruptive force permeates a new environment, it encounters various forms of\\nresistance. It triggers unintended negative consequences, demands protection from potential vulnerabilities, causes\\nuncertainty about whether it can be regulated, and other related factors. This critical-resistant front contrasts with\\na skeptical perception that questions the gravity of this new disruption and the alarmist interpretations surrounding\\nemerging technologies, in this case. A third front embodies an optimistic outlook, emphasizing the manifold benefits\\nacross multiple sectors, fostering enthusiasm for potential advancements and enhancements, and envisioning their\\npotential to address significant challenges. Overall, the interaction between these forms of resistance shapes perceptions\\nof generative models’ societal impact and raises critical questions about their implications for the broader social fabric.\\n2', metadata={'source': './pdf/2403.04667.pdf', 'page': 1}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nThese perspectives, as human behavior changes over time, interact and mutually influence and foster one another. In the\\ncase of ChatGPT, we also observe these attitudes, including optimism, pessimism, and skepticism, but the panorama is\\neven more complex as we present below in our literature review.\\nAs evidence of the increasing interest in ChatGPT and AI, Figure 1 depicts the fluctuations in Google search queries for\\n\"ChatGPT\" globally from May 2020 (the release of ChatGPT) until May 2023 (which includes the time frame of our\\nresearch).\\nFigure 1: Search queries evolution on “ChatGPT” from May 2020 to May 2023 via Google Trends ( https://trends.\\ngoogle.it/trends/explore?date=2020-04-28%202023-05-12&q=Chat%20GPT3&hl=en ). The \"note\" in the\\ngraph reflects an improvement in Google’s data collection system implemented on 1 January 22.\\nFig.1, furthermore, depicts a consistent low level of interest until December 2022, when a significant shift occurred,\\ncoinciding with the months following the launch of Open AI’s ChatGPT as a prototype service on 30 November 2022\\n[Gordon, 2022], which attracted global attention. In April and May of 2023, interest peaked.\\nAs we enter the third wave of AI evolution, examining how socialization processes change and assessing scientific\\ninnovations’ potential positive or negative effects on rights and freedoms is critical. Following Pinch and Bijker [Pinch\\nand Bijker, 1984], it is essential to examine the construction of scientific knowledge across different localities and\\ncontexts. As a result, the primary goal of this analysis is to assess the evolution of ChatGPT in recent years and to\\nexamine its current perceived impact on various social aspects, in the context of the ongoing wave of AI evolution.\\n3 STATE OF THE ART\\nGenerative pre-training (GP) was a well-known concept in machine learning applications since 2012 [noa, 2012].\\nLater, in 2017 Google introduced the transformer architecture [Vaswani et al., 2017]. These advancements led to the\\nbirth of large language models like BERT in 2018 [Devlin et al., 2019] and XLNet in 2019 [Yang et al., 2019]: these are\\npre-trained transformers (PT) but are not designed to be generative. A language model is a probability distribution over\\nsequences of words [Jurafsky and Martin, 2008]: given any sequence of words of length m, a language model assigns a\\nprobability to the whole sequence. A Large Language Model (LLM) is a language model based on a neural network\\nwith many parameters (typically billions or more). Prior to transformer-based architectures, the best-performing neural\\nNatural Language Processing (NLP) models employed supervised learning from large amounts of manually-labeled\\ndata.\\nThe main drawbacks of using supervised learning are the impossibility to use it on not well-annotated datasets, and\\nalso the prohibitive cost and time required to train extremely large language models [Radford and Narasimhan, 2018].\\nUsually, LLMs trained on a large quantity of data can perform discretely a good number of tasks; anyway, they can be\\nfine-tuned (i.e., further trained on specific data) to execute a specific task with better performance.\\nLater, in 2018 OpenAI [OpenAI, 2018] published its famous article \" Improving Language Understanding by Generative\\nPre-Training \", in which the first Generative Pre-trained Transformer (GPT) system was introduced [Radford and\\nNarasimhan, 2018]. GPT is a type of large language model (LLM) used mainly for Generative AI, which is a type of AI\\ncapable of generating various kinds of content, such as text and images, in response to instructions (also known as\\nprompts). Generative AI models learn the patterns and structure of their input training data and then generate new data\\nthat has similar characteristics, according to what has been asked as a prompt.\\n3', metadata={'source': './pdf/2403.04667.pdf', 'page': 2}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\n3.1 Foundational models\\nA foundational model is an AI model trained on broad data at scale such that it can be adapted to a wide range of\\ndownstream tasks. The most famous and performant GPT foundation models are the ones released by OpenAI. The\\nmost recent is GPT-4 [OpenAI, a], for which OpenAI refused to publish the size and training details due to business\\nreasons [OpenAI, b].\\nOther such models include Google’s PaLM [Google, a] and Together’s GPT-JT [Together], which has been reported as\\nthe closest-performing open-source alternative to GPT-3. Meta as well has released a generative foundational language\\nmodel, named LLaMA [Meta]. Foundational GPTs can also handle media (and not only text), both for input and/or\\noutput. For example, GPT-4 is capable of processing text and images as input but only produces text as output.\\n3.2 Task-Specific Models\\nA foundational GPT model is usually further trained to better perform specific tasks and/or handle subject-matter\\ndomains. One of the most used methods for such adaptation is fine-tuning (beyond that done for the foundation model).\\nFine-tuning is an approach in which the weights of a pre-trained (language) model are trained on new data. One example\\nof this is fine-tuning LLM to comprehend and follow instructions: in January 2022, OpenAI introduced InstructGPT\\n[OpenAI, c] –a series of models which were fine-tuned to follow instructions. The gained advantages included higher\\naccuracy, less negative/toxic sentiment, and generally better alignment with user needs. Other examples of task-specific\\nmodels are chatbots, AI systems that engage in human-like conversation; ChatGPT [OpenAI, d] is currently the most\\nfamous chatbot. Anyway, other major chatbots currently exist, such as Microsoft’s Bing Chat [Bing] – which uses\\nOpenAI’s GPT-4 [OpenAI, a] (as part of a broader close collaboration between OpenAI and Microsoft) –, Google’s\\ncompeting chatbot Bard [Google, b] and LaMDA [Google, c], Jasper Chat [Jasper], Claude [Anthropic, a].\\nFinally, the text-to-model task is becoming quite popular. To date, some famous models are Dall-E 2 [2],Stable\\nDiffusion [Diffusion], PhotoSonic Art Generator [Writesonic] whose task is the production of images based on\\nuser-provided textual prompts. Following the text-to-image models, also the text-to-video task has been addressed\\nwith a lot of models, such as: Runway [Runway], Meta’s Make-A-Video [Make-A-Video], Google’s Imagen Video\\n[Imagen] and Phenaki [Villegas et al.]. All these models can generate video from text and/or text/image prompts.\\n3.3 Domain-specific models\\nGPT systems can be re-trained to address particular fields or domains. Examples of such models (and apps) are\\nBloombergGPT [Bloomberg] for the financial domain, which should provide help with financial news and information,\\nSlackGPT [Slack] to support the Slack instant-messaging service by providing help and guidance with navigating and\\nsummarizing discussions (based on OpenAI’s API), BioGPT [Microsoft] for the biomedical domain, to provide help\\nwith biomedical literature text generation and mining, CoPilot [GitHub] for the IT source code development domain, to\\nprovide auto-completion capabilities for developers.\\nSometimes domain-specificity is realized via software components, specifically named plug-ins or add-ons. For\\nexample, Google Workspace has available add-ons such as GPT for Sheets and Docs – which is reported to aid the use\\nof spreadsheet functionality in Google Sheets.\\n4 STUDY DESIGN\\nTo perform this review, we followed the protocol proposed in [Cartaxo et al., 2018], and we completed the review\\nprocess with the strategies presented in [Kitchenham and Charters, 2007] for performing systematic literature reviews.\\nThe following subsections describe in detail the study design and its execution. The literature review presented in this\\nwork was carried out through the following steps:', metadata={'source': './pdf/2403.04667.pdf', 'page': 3}),\n",
       " Document(page_content='The following subsections describe in detail the study design and its execution. The literature review presented in this\\nwork was carried out through the following steps:\\n1.Goal and Research questions : the goal and the correlated research questions were identified to guide the\\nliterature review;\\n2.Search strategy : defining the strategy to collect previous works published in the literature, including research\\ndatabases and query strings;\\n3.Eligibility criteria definition : the criteria used to filter the collected studies have been defined;\\n4.Data extraction : defining how relevant data were extracted to help answer the research questions;\\n5.Data analysis and synthesis : defining how to organize extracted relevant data to answer the research questions.\\n4', metadata={'source': './pdf/2403.04667.pdf', 'page': 3}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFig. 2 summarizes the review protocol.\\nFigure 2: Research protocol used in the literature review\\n4.1 Goal and Research Question Definition\\nWe formulated the following research questions to analyze the diverse dimensions of ChatGPT’s impact.\\nBased on this goal, we defined the following research questions:\\n•RQ1 : What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\n•RQ2 : What are the emerging trends perceived in ChatGPT development?\\n•RQ3 : Which areas of improvement can be identified in the development of such technologies?\\n4.2 Search strategy\\nThe research has been split into two distinct parts: part one , focused on grey literature, and part two , focused on white\\nliterature. Including both grey literature and academic contributions allowed us to conduct a deeper exploration of the\\nimpact of ChatGPT in various settings and from various perspectives.\\nIn part one – started on 29th November 2023 – we focused on grey-literature sources, like blog posts and news articles\\nfrom multiple domains – such as business, education, technology, and society – emphasizing ChatGPT. Here our goal\\nwas to feel the sentiment of the media and tech sphere and capture feelings on ChatGPT that cannot emerge from\\nwhite-literature sources. Furthermore, some pilot searches on white-literature sources produced very few results, which\\ndid not allow us to derive reliable and consistent conclusions.\\nThe string used for part one was:\\n(“ChatGPT” AND “social concerns”) (“ChatGPT” AND “social impact”) (“ChatGPT” AND “Human rights”) (“ChatGPT”\\nAND “society*”) (“ChatGPT” AND “education*”) (“ChatGPT” AND “ethics”)\\nThis string was used to perform a keyword-based search on Google search engine1; the search was performed in\\nprivate-browsing mode, after logging out from personal accounts and erasing all web cookies and history [Piasecki\\net al., 2017].\\nIn the end, this search resulted in 1230 literature sources.\\nWhile executing part one of the research, we continued executing periodic pilot searches for white-literature sources.\\nIn late February 2023, we noticed a notable change: the number of scientific articles addressing ChatGPT increased\\nsignificantly, encompassing diverse approaches and perspectives. This may be due to the fact that academic papers need\\nmore time to be reviewed by peers and published (w.r.t. blog posts and news articles).\\nSo, for part two of the research – started on 22nd February 2023 – we decided to use Google Scholar2as white-literature\\nsearch engine; here we searched for scientific articles of various fields – such as business, education, technology, society,\\nhealthcare. The string used for part two of the research was the following:\\n1https://www.google.it/\\n2https://scholar.google.com/\\n5', metadata={'source': './pdf/2403.04667.pdf', 'page': 4}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\n(\"Large Language Model\" AND \"Social impact\") (\"Large Language Model\" AND Human Rights\") (\"Large Language\\nModel\" OR \"ChatGPT\" AND \"Ethics\")(\"Large Language Model\" AND \"ChatGPT\" AND \"Ethics\") (\"Large Language\\nModel\" OR \"ChatGPT\" AND \"Social concerns\") (\"Large Language Model” AND \"ChatGPT\" AND \"society*\") (\"Large\\nLanguage Model\" AND \"ChatGPT\" AND \"education*\") (\"Chat GPT\" AND \"social concerns\") (\"ChatGPT\" AND \"social\\nimpact\") (\"ChatGPT\" AND \"Human rights\") (\"ChatGPT\" AND \"society*\") (\"ChatGPT\" AND \"education*\") (\"ChatGPT\"\\nAND \"ethics\")\\nIn the end, this search resulted in 86 new literature sources.\\nAll the documents obtained with this search strategy (both in part one and part two) were surveyed using a 3-stages\\ninformation classification process. In the first stage, only the title and keywords of the collected articles were read. In\\nthe second stage, we analyzed the abstract of each article while in the third stage we read the complete article. All these\\nstages were conducted separately and in blind-view way by two of the authors. In case of a disagreement, a third author\\nmanually verified and took the final decision.\\nAll found publications were subjected to the selection criteria outlined in Sec. 4.3 to determine their relevance for\\ninclusion in the analysis.\\n4.3 Eligibility Criteria Definition and Data Extraction\\nThe selection procedure used for filtering the identified pool of 1316 papers was based on the following criteria:\\n• for every text the content should be mainly related to ChatGPT and LLM,\\n• the content should be written in English.\\nThen, the following criteria helped us to ensure we only included publications providing substantial information to our\\nanalysis, especially on ChatGPT, while we excluded papers with only brief mentions or tangential references:\\n•Forblog posts , we required the author’s name to be consistently provided, and the blogs should be specialized\\nin the relevant subject matter.\\n•Regarding news articles , we preferred those that offered an extensive analysis. We adopted this criterion\\nbecause initial news coverage of ChatGPT tended to be repetitive, often focusing on its capabilities and\\nlimitations and just providing a brief history of the model.\\n•In the case of academic articles , we sought diverse approaches to ensure a comprehensive perspective rather\\nthan lean solely on a single field, such as, for instance, the impact of ChatGPT in education.\\nAfter applying these selection criteria, we selected a total of 71papers from our initial pool of 1316 articles.\\nIn the Data Extraction step, we extracted all relevant data that could help answer any of the research questions. The\\nextraction process was performed by two of the authors and conflicts were solved by a third author in a blind-view way.\\nWe used Atlas.ti3to tabulate and organize data. More detailed information regarding the data and how it was indexed\\ncan be found in the online appendix [Baldassarre et al., 2023].\\n5 DATA ANALYSIS AND SYNTHESIS\\nPart one of the search has been conducted from 29th November 2022 to 22nd February 2023. Part two has been\\nconducted from 22nd February 2023 to 19th May 2023. Table 1 details the results obtained in both parts, as well as the\\ndocuments selected once our selection criteria were applied.\\nResearch phase Resources\\nretrievedResources\\nanalyzedResources\\nselected\\nFirst part, until Feb. 22 on Google 1230 300 25\\nSecond part, until May 19 on Google scholar 86 63 46\\nTable 1: Amount of documents collected grouped by research phase.\\n3https://atlasti.com/\\n6', metadata={'source': './pdf/2403.04667.pdf', 'page': 5}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nIn order to answer RQ1, RQ2 and RQ3 we performed an analysis using Atlas.ti. Atlas.ti is a qualitative research tool\\nthat enables the systematic organization of documentary resources by using codes and creating documentary categories.\\nAtlas.ti allowed us to visualize and intuitively present content trends within the analyzed documents. We employed this\\ntool to analyze the selected papers, which were organized in Atlas.ti \"document groups\" following the categories in\\nTable 2.\\nCodes for text analysis in Atlas.ti\\nPositive Impact Negative Impact Emerging trends Areas for improvement\\nBenefits to education Disinformation risk Impact on the tech/AI\\nmarketNeed for appropriate\\nregulation\\nBenefits to customer\\nserviceNegative impact on\\nfreedom of expressionCopyright uncertainty GDPR compliance\\nconcerns\\nBenefits of responses in\\nreal-timeBias concerns Uncertainty over liability\\nfor production failuresUncertainty of\\nclassification under the\\nAI Act\\n... ... ...\\nTable 2: Codes for text analysis in Atlas.ti\\nThe categories and codes presented in Table 2 are derived from an in-depth analysis of the selected papers. The codes\\nwere primarily developed \"in vivo,\" meaning they emerged as potential units of analysis during the reading and analysis\\nprocess. For example, when repetitive references were made to the potential positive impacts of ChatGPT in education,\\nwe created the code \" Benefits for education \". After reviewing and analyzing the documents, the codes were organized\\ninto four categories, which helped to address RQ1, RQ2, and RQ3. The criteria for categorization emerged from\\na thoughtful reflection on the codes and their contextual relevance. In the category of positive impacts, codes such\\nas \"24/7 Availability \" and \" Personalized feedback \" are included since they are frequently mentioned as strengths of\\nChatGPT. On the other hand, the category of negative impacts encompasses codes such as \" Bias concerns \", which\\nemerged as a recurring argument when discussing the potentially detrimental effects of this model. Other codes like\\n\"Privacy concern \" and \" Water footprint \" were identified, further emphasizing the importance of addressing these issues\\nin the context of ChatGPT’s implementation.\\nIn the category of emerging trends , we captured the unforeseen consequences, which, although potentially negative,\\narise unexpectedly from the evolution of the model itself. Examples include \" Copyright uncertainty \" and the \" Need\\nto clarify private sector liability \", which pose challenges that trigger transformations in particular domains such as\\nthe \" Impact on the tech/AI market \". This category also encompasses unexpected challenges to the AI Act [European\\nCommission, 2021] and its coverage of generative models. Another aspect within this category is \" Unintentional\\nmisinformation \", referring to instances where the chat model provides unintentionally inaccurate information, a matter\\ncurrently under scrutiny by various experts. Additionally, the category encompasses codes like \" Skepticism about its\\nactual impact \". These emerging trends shed light on the complex issues ChatGPT presents.\\nThe final category encompasses areas of improvement , focusing on aspects that require further development to address\\nall the adverse and unexpected effects of the model. For instance, the tag \" Negative outcomes mitigation \" highlights\\nthe need for more efforts to minimize adverse consequences. The categorization of codes as \" Uncertainty in data\\ngovernance \" also responds to the criteria as an area of improvement rather than a negative impact. The above-mentioned\\ndecision was made considering that current legal frameworks do not adequately account for models like GPT, thus\\nhighlighting the need for specific measures to address these cases, which will likely be developed in the coming years.\\nThis category also encompasses improvement areas, such as \" Limited up-to-date information \" and \" Limited Medical', metadata={'source': './pdf/2403.04667.pdf', 'page': 6}),\n",
       " Document(page_content='This category also encompasses improvement areas, such as \" Limited up-to-date information \" and \" Limited Medical\\nterminology \", emphasizing the potential for enhancements.\\nAfter establishing the codes and categories, we visually represented their distribution across the 71 documents. The\\nresulting tree map in Atlas.ti, depicted in Figure 3, displays the frequency of the most repeated codes.\\n7', metadata={'source': './pdf/2403.04667.pdf', 'page': 6}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFigure 3: Treemap- Atlas.ti with codes distribution\\nTable 3 also shows the frequency of codes, highlighting the least and most recurrent codes within the Atlas.ti analysis.\\nMost recurrent # Least recurrent #\\nBias concerns 39 Unpredictability risk 1\\nDisinformation risk 25 Prone to injection attacks 1\\nBenefits for education 22 Over-regulation risk 1\\nPrivacy concern 21 Opportunity to increase renewable energy use 1\\nNeed for appropriate regulation 20 Need for using Renewable Energy Sources 1\\nDiscrimination risk 18 Need for human rights safeguards 1\\nUnfairness in the data use in the model 18 Need for explainability and traceability 1\\nBenefits for customer service 17 Need for Accessibility and Affordability 1\\nInaccurate answers 17 Limited Medical terminology 1\\nBenefits for content creation 15 Lead people into extremist positions risk 1\\nTable 3: Most and least recurrent codes; each category is associated with its number of occurrences.\\nIn addition, Fig.4 displays a Sankey diagram that graphically depicts the distribution of code categories across the\\nanalyzed documents (in document groups-unit). The diagram shows that scientific papers, blog posts, conference\\nsymposiums, and other types of publications encompass all coding groups (negative and positive impacts, areas of\\nimprovement, and emerging trends). See Table A2 in online appendix [Baldassarre et al., 2023] for a description\\nof document categories. The articles category includes just emerging trends, areas for improvement, and negative\\nperceptions. Most document categories, including scientific papers, columns, analyses, editorials, and articles, exhibit a\\nnegative tendency. Notably, positive perceptions are more prevalent in blog-posts, conference and symposium papers,\\nand to a lesser degree in news articles. \" Bias concern \" and \" Disinformation risk \" are two of the most common codes\\ncontributing to negative perceptions. In contrast, \" Benefits for customer service \" and \" Multidisciplinary benefits \" are the\\nmost prevalent codes in the documents with a positive trend. Table A3 in online appendix [Baldassarre et al., 2023]\\ndetails tendencies and code frequency across all document categories.\\n8', metadata={'source': './pdf/2403.04667.pdf', 'page': 7}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFigure 4: Sankey diagram illustrating the distribution of code groups across document groups\\n6 Discussion\\nRQ1. What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\nThroughout our literature review, we identified several positive and negative impacts attributed to ChatGPT. Noteworthy\\nbenefits include: the potential for enhancing customer service; multiple papers emphasize the positive impact of\\nChatGPT in this domain [International Journal of Human Rights Law Review, 2023, Ray, 2023, Paul et al., 2023,\\nHillemann and Zimprich, 2023, Davis, 2022, Lock, 2022, Deo, 2023, Rivas and Zhao, 2023, Kumordzie, 2023, Levente,\\n2023, Marr, 2023, Abdullah et al., 2022, Khowaja et al., 2023, Gupta et al., 2023, Bruff, 2023, Iskender, 2023]. The\\nmodel is highlighted as an enabler of cross-cultural dialogue, facilitating communication between individuals from\\ndifferent cultural backgrounds [International Journal of Human Rights Law Review, 2023]. Moreover, ChatGPT offers\\nthe advantage of automating repetitive tasks , freeing time for more complex and value-added activities [Davis, 2022,\\nLevente, 2023, Khan and Umer, 2023, Gupta et al., 2023]. These benefits extend to various sectors, including business\\nand healthcare [Deo, 2023]. Another key advantage is its availability around the clock . This 24/7 accessibility proves\\nvaluable in commercial, healthcare, and educational contexts [Paul et al., 2023, Levente, 2023, Lee, 2023, Sun and\\nHoelscher, 2023, Cardoso, 2023]. The model’s continuous availability ensures timely assistance and support, covering\\nusers’ diverse needs.\\nChatGPT also demonstrates significant potential in education , offering various advantages [Marr, 2023, Shidiq, 2023,\\nLee and Yilmaz Soylu, 2023, Sun and Hoelscher, 2023, Mhlanga, 2023, Rivas and Zhao, 2023, Abdullah et al., 2022,\\nBahrini et al., 2023, Ray, 2023, Ausat et al., 2023, Solaiman et al., 2019, Lee and Yilmaz Soylu, 2023, Tiunova and\\nMuñoz, 2023, Iskender, 2023, Bruff, 2023, Geertsema et al., 2023]. Despite concerns about plagiarism and academic\\nintegrity, the model’s integration can enhance teaching practices in several ways. It can, for example, automates\\ncurriculum creation, enabling educators to save time and streamline the process [Marr, 2023, Bahrini et al., 2023].\\nMoreover, it facilitates the development of innovative educational content, fostering an engaging learning environment\\n[Lee, 2023, Shidiq, 2023]. Additionally, the model serves as a personalized study support assistant, providing tailored\\nguidance and assistance to individual learners [Ray, 2023, Rivas and Zhao, 2023, Sun and Hoelscher, 2023, Mhlanga,\\n2023, Abdullah et al., 2022, Ausat et al., 2023]. In this perspective, a vision emphasizes the need for controlled\\nintegration and adherence to academic guidelines to ensure responsible and ethical use of generative AI models in\\neducation [Sun and Hoelscher, 2023]. By establishing appropriate regulations and ethical frameworks, the educational\\nbenefits of ChatGPT can be maximized while addressing concerns related to plagiarism and promoting an enriching\\nlearning experience.\\nIn the medical field, the model has shown promise in research, data analysis, and telemedicine applications, contributing\\nto advancements in healthcare [Bahrini et al., 2023]. Furthermore, diverse papers recognize its potential contribution to\\n9', metadata={'source': './pdf/2403.04667.pdf', 'page': 8}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\naddressing environmental challenges . For instance, it may help find innovative solutions to reduce water consumption\\nin the AI industry, highlighting its role in promoting sustainability [George et al., 2023]. Similarly, ChatGPT is also\\nappreciated for its potential as an informative and accountability tool within institutions [Ray, 2023, Cardoso, 2023,\\nBiswas, 2023a]. Lastly, it positively impacts journalism, where it can help with content creation, fact-checking, and\\ngenerating engaging narratives [Davis, 2022, Marr, 2023, Bahrini et al., 2023].\\nConversely, the most recurrent social concern is bias. Several papers [International Journal of Human Rights Law\\nReview, 2023, Ray, 2023, Paul et al., 2023, Lee, 2023, Shidiq, 2023, Biddle, 2023, C and J, 2023, Equality Now,\\n2023, Robson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023a, Vidhya et al., 2023, Khan and Umer, 2023, Ausat et al.,\\n2023, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Treude and Hata, 2023, Farrell et al., 2022, Tiunova\\nand Muñoz, 2023, Thirunavukarasu et al., 2023, Stepanechko and Kozub, 2023, Geertsema et al., 2023] discuss the\\nrisk of deepening existing biases and how ChatGPT can include sexist and racist views due to the characteristics of\\nthe data used during its training. In this sense, there is a special concern about using these tools in various sectors,\\nincluding finance [Khan and Umer, 2023] and other social activities, which can replicate and deepen structural and\\nhistorical inequalities. Our analysis also reveals another perceived negative impact, which is its potential to generate\\nfalse information and facilitate the spread of disinformation [Ray, 2023, Paul et al., 2023, Hillemann and Zimprich,\\n2023, Davis, 2022, Lock, 2022, Bahrini et al., 2023, Equality Now, 2023, Robson, 2023, Li, 2023, Biswas, 2023a,\\nVidhya et al., 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Wolf, 2023, Vallance,\\n2022, Rozado, 2023-03, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Farrell et al., 2022, Tiunova and\\nMuñoz, 2023, Khowaja et al., 2023]. This misuse of technology directly affects rights related to access to accurate\\ninformation and freedom of expression, as well as democratic stability. This phenomenon is particularly relevant as,\\nalthough remarkable in artificial intelligence, the advancements and improvements in generative models have raised\\nconcerns and uncertainties regarding their impact on democratic processes . There is a special concern about the\\nease with which false information can be generated and disseminated, especially in critical contexts such as elections,\\nreferendums, political instability, war conflicts, or under dictatorial regimes. Furthermore, this potential negative impact\\nincludes the digital public sphere, where spreading hate speech on social networks [Institute for Human Rights and\\nBusiness, 2023] can lead to social fragmentation and bolster the manipulation of democratic institutions and social\\ncontrol. False information can be weaponized across various domains, from the stock market to information warfare\\nand propaganda.\\nIn the same vein, another relevant concern is privacy [International Journal of Human Rights Law Review, 2023, Ray,\\n2023, Paul et al., 2023, Deo, 2023, Lee, 2023, Bahrini et al., 2023, Mhlanga, 2023, C and J, 2023, Biswas, 2023a,\\nRobson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023b, Vidhya et al., 2023, Khan and Umer, 2023, Helberger and\\nDiakopoulos, 2023, Vallance, 2022, Khlaif, 2023, Paul et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nIskender, 2023]. Different aspects contribute to this concern, including the coverage of these models under AI Act\\nregulations, its extensive use of user data (particularly minors), the potential for surveillance applications, the data\\nvulnerability to cyber attacks targeting ChatGPT, and the privacy implications when integrating the model into various', metadata={'source': './pdf/2403.04667.pdf', 'page': 9}),\n",
       " Document(page_content='vulnerability to cyber attacks targeting ChatGPT, and the privacy implications when integrating the model into various\\ndomains such as education and the military. Privacy is a growing concern as illustrated by the case of the Italian state\\nban. In this case, the Italian authorities asked OpenAI to “expand its privacy policy for users and made it also accessible\\nfrom the sign-up page prior to registration with the service” [Garante per la protezione dei dati personali, 2023] in order\\nto operate in Italy. This case highlights the urgency for ensuring responsible and transparent use of the technology.\\nAnother important negative impact is the risk of job loss , as automation and AI capabilities advance [Davis, 2022, Deo,\\n2023, Rivas and Zhao, 2023, Lee and Yilmaz Soylu, 2023, Biddle, 2023, Robson, 2023, Khan and Umer, 2023, Air\\net al., 2023, Wolf, 2023, Curtis and ChatGPT§, 2023, Gabbiadini et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023].\\nFurthermore, other papers show apprehension about over-regulation [Helberger and Diakopoulos, 2023], highlighting\\nthe needed balance between ensuring ethical and responsible use of AI technologies while avoiding stifling innovation.\\nAdditionally, the model’s own cybersecurity is listed as a negative impact [Levente, 2023, Bahrini et al., 2023, C\\nand J, 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Khowaja et al., 2023].\\nThe comprehensive list of negative perceptions on ChatGPT impacts can be found in Table A1 in online appendix\\n[Baldassarre et al., 2023], providing further insights into the concerns found in the literature.\\nRQ2. What are the emerging trends perceived in ChatGPT development?\\nAn essential part of our review shows emerging trends [Hillemann and Zimprich, 2023, Telegraph, 2023, DiBenedetto,\\n2023, Deo, 2023, Equality Now, 2023, Institute for Human Rights and Business, 2023, Kumordzie, 2023, Vallance,\\n2022, Perrigo, 2023, Al Ashry, 2023, Bareis, 2023, Curtis and ChatGPT§, 2023, Rutinowski et al., 2023, Lee and\\nYilmaz Soylu, 2023, Rozado, 2023-03, George et al., 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas, 2023b,\\nVidhya et al., 2023, Suguri Motoki et al., 2023, Rivas and Zhao, 2023, Khlaif, 2023, Bahrini et al., 2023, Ray, 2023, Khan\\nand Umer, 2023, Ausat et al., 2023, Paul et al., 2023, Tamkin et al., 2021, Solaiman et al., 2019, Lee and Yilmaz Soylu,\\n10', metadata={'source': './pdf/2403.04667.pdf', 'page': 9}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\n2023, Geertsema et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023, Bruff, 2023]. Within this category we found uncertainty surrounding copyright , which is a prominent trend,\\nraising doubts about the possibility of profiting from chat-generated content and whether such content is subject to\\ncopyright protection. Resolving these issues requires legislative intervention, leading to a particular and complex debate\\namong authorities regarding the legal implications of AI-generated content [Hillemann and Zimprich, 2023]. There is\\nalso a growing demand for the enhancement of regulatory frameworks to safeguard original content, considering that\\nmodels like ChatGPT have the potential to negatively impact the work of scientists, writers, researchers, and artists\\n[Al Ashry, 2023, Ray, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023]. Notably, analyses such as by Khowaja et al. [Khowaja et al., 2023] raise crucial questions regarding ownership\\nrights over the data used to train the model and the ownership of the model itself. As mentioned earlier, there is a\\ndominant call for establishing ethical use guidelines , both at a general societal level [Gabbiadini et al., 2023] and\\nspecifically within educational and research institutions [Mhlanga, 2023, Khlaif, 2023, Ausat et al., 2023, Solaiman\\net al., 2019, Tamkin et al., 2021, Lee and Yilmaz Soylu, 2023, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023,\\nIskender, 2023, Stepanechko and Kozub, 2023, Bruff, 2023]. These guidelines will be pivotal in ensuring responsible\\nadoption models such as ChatGPT.\\nAnother emerging trend is developing transparency mechanisms [Equality Now, 2023, Lee and Yilmaz Soylu, 2023,\\nLee, 2023, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nStepanechko and Kozub, 2023]. Transparency is considered a vital strategy to address resistance toward adopting\\nthese models in various contexts [Bahrini et al., 2023] and to mitigate potential AI stigmatization. However, it is also\\nacknowledged that transparency poses challenges that need to be overcome [Khowaja et al., 2023].\\nAnother crucial issue to highlight is the accountability for potentially harmful uses of such technology [Deo, 2023,\\nBiswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and Kozub,\\n2023, Helberger and Diakopoulos, 2023], which involves both the end-users and the companies responsible for its\\ndevelopment [Helberger and Diakopoulos, 2023, Deo, 2023, Institute for Human Rights and Business, 2023, George\\net al., 2023, Biswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and\\nKozub, 2023]. This concern extends to applications in sensitive domains like the military [Deo, 2023]. Furthermore,\\nother several notable trends emerge, including the need for timely and appropriate regulation [Gabbiadini et al.,\\n2023, Bruff, 2023], the existence of political bias (29, 40, 42, 62), transformations within the AI market [Deo,\\n2023, Equality Now, 2023, Kumordzie, 2023, Perrigo, 2023, Tamkin et al., 2021, Farrell et al., 2022, Geertsema et al.,\\n2023], and the integration of renewable technologies and environmental awareness within this field [C and J, 2023,\\nInternational Journal of Human Rights Law Review, 2023].\\nRQ3. Which areas of improvement can be identified in the development of such technologies?\\nThe findings concerning areas for improvement within the field of generative AI present a diverse range of perspectives\\n[Hillemann and Zimprich, 2023, Helberger and Diakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023,\\nInstitute for Human Rights and Business, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al., 2023, Kumordzie,', metadata={'source': './pdf/2403.04667.pdf', 'page': 10}),\n",
       " Document(page_content='Institute for Human Rights and Business, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al., 2023, Kumordzie,\\n2023, Wolf, 2023, Vallance, 2022, Perrigo, 2023, Al Ashry, 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas,\\n2023a, Zhuo et al., 2023, Abdullah et al., 2022, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Solaiman et al., 2019, Tamkin et al., 2021, C and J, 2023, Tiunova and Muñoz,\\n2023, Khowaja et al., 2023, Gabbiadini et al., 2023, Gupta et al., 2023, Iskender, 2023, Stepanechko and Kozub, 2023,\\nBruff, 2023]. One prominent area is the examination of regulations [Hillemann and Zimprich, 2023, Helberger and\\nDiakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023, Institute for Human Rights and Business, 2023,\\nKumordzie, 2023, Wolf, 2023, Al Ashry, 2023, George et al., 2023, Levente, 2023, Lee and Yilmaz Soylu, 2023, Ray,\\n2023, Solaiman et al., 2019, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023, Bruff, 2023],\\nparticularly regarding whether the risk-based approach outlined in the AI Act effectively covers generative models\\n[Helberger and Diakopoulos, 2023, Wolf, 2023]. It is suggested that comprehensive guidelines should encompass the\\nentire spectrum, from its application to the AI Research and Development (R&D) [George et al., 2023]. Furthermore,\\nthere is a growing advocacy for a people-centred vision [Wolf, 2023, Kumordzie, 2023, Ray, 2023] that emphasizes\\nthe importance of human rights and ethical considerations in designing and implementing generative AI systems.\\nSpecifically, Solaiman et al. [Solaiman et al., 2019] examine the need to \"build frameworks for navigating trade-offs\"\\nand develop decision-making frameworks that account for the complexities and potential trade-offs associated with\\ngenerative AI. Similarly, Li [Levente, 2023] highlights the necessity to transform the European regulatory paradigm to\\neffectively address the challenges posed by LLMs.\\nAnother area of opportunity lies in addressing technical limitations within generative AI models [Levente, 2023, Curtis\\nand ChatGPT§, 2023, Sun and Hoelscher, 2023, Abdullah et al., 2022, Bahrini et al., 2023, Cao et al., 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Thirunavukarasu et al., 2023, Gupta\\net al., 2023, Iskender, 2023]. For instance, a significant challenge is the presence of fictional references [Tiunova\\n11', metadata={'source': './pdf/2403.04667.pdf', 'page': 10}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nand Muñoz, 2023] within the generated text, which hampers its reliability. Additionally, words repetition further\\naffects the produced text’s overall quality [Wolf, 2023]. Moreover, the phenomenon of inaccurate information, named\\n\"hallucinations \" [Wolf, 2023, Tamkin et al., 2021], and the lack of context understanding [Paul et al., 2023], are also\\nidentified as areas requiring attention and improvement. Furthermore, the literature highlights the need for enhanced\\nrisk mitigation mechanisms [Equality Now, 2023, Biddle, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al.,\\n2023, Vallance, 2022, Perrigo, 2023]. This entails refining processes for filtering potentially harmful responses\\n[Biddle, 2023], improving the quality and reliability of the data used to train the models [Levente, 2023], and\\nincorporating ethical guidelines for its development [Air et al., 2023], among other measures [Perrigo, 2023].\\nData governance is another significant area that requires attention [Deo, 2023, Ray, 2023, Khan and Umer, 2023,\\nTiunova and Muñoz, 2023]; this crucial task involves safeguarding sensitive information against security breaches,\\nunauthorized access, and information theft [Ray, 2023, Khan and Umer, 2023]. In the same vein, establishing clear\\nguidelines regarding the scope and limitations of information exchange with third parties is also paramount\\n[Tiunova and Muñoz, 2023].\\nOther areas of opportunity include the need for up-to-date data [Sun and Hoelscher, 2023, Zhuo et al., 2023, Abdullah\\net al., 2022, Gupta et al., 2023] to ensure the accuracy and relevance of generative AI models. However, this requirement\\npresents a trade-off between incorporating new data to improve performance or addressing data governance issues first.\\nThe literature review also highlights several other areas of opportunity for improvement; these include promoting\\nend-user responsibilities [Li, 2023], advocating for timely regulation [Li, 2023, Tiunova and Muñoz, 2023], and\\nraising awareness of environmental impact [Tiunova and Muñoz, 2023, Geertsema et al., 2023], among other\\nconsiderations.\\n7 CONCLUSION AND FUTURE WORK\\nWhile we are still in the early stages of evaluating the social impact of generative AI models, this systematic literature\\nreview allows us to gain initial insights into the perceptions surrounding their emergence in contemporary society.\\nOur analysis has revealed notable areas of concern, particularly privacy and the potential for bias . As generative\\nmodels continue to be adopted in diverse social contexts, addressing and mitigating issues related to inequality, bias,\\ndiscrimination, and stereotypes becomes urgent.\\nIn light of this, it is essential to note that generative AI reflects the social context in which it was created. As these\\nmodels are trained on data that captures various aspects of our reality, it becomes clear that addressing their flaws and\\nbiases requires a comprehensive understanding of the broader social context within which they operate. Rather than\\nsolely focusing on repairing the model, it is imperative to also engage in a critical examination of the social factors that\\ncontribute to these biases and limitations.\\nSome analyses argue that generative AI models, such as ChatGPT, are not intended to address social inequalities.\\nWhile this may be true, it is also essential that these models do not inadvertently contribute to exacerbating social\\nissues. Acknowledging that scientific breakthroughs do not occur within a social vacuum is critical. Therefore,\\nwe must foster a conscious, responsible, and ethically-driven progression of generative AI. Equally important is to\\nemphasize that generative AI models hold immense potential and offer substantial benefits across various fields and\\nsectors, including education, medicine, marketing, business, research, and science. Their impact extends beyond', metadata={'source': './pdf/2403.04667.pdf', 'page': 11}),\n",
       " Document(page_content='sectors, including education, medicine, marketing, business, research, and science. Their impact extends beyond\\ninnovation and significantly influences the legislative landscape. Consequently, policymakers need to address the\\nnecessity for appropriate regulation that not only addresses significant concerns associated with their use, but also\\nsupports and facilitates the ethical development of generative AI models [Clayton, 2023, Anthropic, b]. Undoubtedly,\\nAI generative models have reshaped our way of being in the world, triggering profound changes in our perception\\nand engagement with it. In our relentless pursuit to emulate human interaction, we have also confronted stereotypes,\\nbiases, and imperfections. Rather than succumbing to discouragement, we should use them as motivation to address\\nthem diligently and strive for continuous improvement. More work is required to develop more robust frameworks\\nand ethical guidelines, not only to improve accuracy and efficiency but also to ensure responsible deployment. As\\npart of future work, we propose evaluating ChatGPT regulation in the US, Europe, and Latin America. This analysis\\nwill examine how current legal tools address generative models’ challenges in particular locations. Additionally, to\\nunderstand ChatGPT users’ professional and social views, a survey has been designed and will be distributed among\\nprofessionals and researchers from diverse universities and research centers worldwide. With these exercises, we want\\nto gain a comprehensive understanding of its adoption, regulatory challenges, user perspectives, and deepening into its\\nsocial impact.\\n12', metadata={'source': './pdf/2403.04667.pdf', 'page': 11}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nReferences\\nJoe McKendrick. Learning to trust artificial intelligence: An optimist’s view,\\n2019. URL https://www.forbes.com/sites/joemckendrick/2019/06/09/\\nlearning-to-trust-artificial-intelligence-an-optimists-view/ .\\nHenry Farrell, Abraham Newman, and Jeremy Wallace. Spirals of delusion how al distorts decision-making and\\nmakes dictators more dangerous. 2022. ISSN 0015-7120. URL https://www.foreignaffairs.com/world/\\nspirals-delusion-artificial-intelligence-decision-making .\\nCindy Gordon. Will 2023 be the year that OpenAI’s ChatGPT breaks free?, 2022. URL https://www.forbes.com/\\nsites/cindygordon/2022/12/29/will-2023-be-the-year-that-openais-chatgpt-breaks-free/ .\\nTrevor J. Pinch and Wiebe E. Bijker. The social construction of facts and artefacts: Or how the sociology of\\nscience and the sociology of technology might benefit each other. 14(3):399–441, 1984. ISSN 0306-3127. URL\\nhttps://www.jstor.org/stable/285355 .\\nAcoustic Modeling Using Deep Belief Networks | IEEE Journals & Magazine | IEEE Xplore, 2012. URL https:\\n//ieeexplore.ieee.org/document/5704567 .\\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and\\nIllia Polosukhin. Attention Is All You Need, December 2017. URL http://arxiv.org/abs/1706.03762 .\\narXiv:1706.03762 [cs].\\nJacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Trans-\\nformers for Language Understanding, May 2019. URL http://arxiv.org/abs/1810.04805 . arXiv:1810.04805\\n[cs].\\nZhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. XLNet: Generalized\\nAutoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems ,\\nvolume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/\\n2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html .\\nDaniel Jurafsky and James Martin. Speech and Language Processing: An Introduction to Natural Language Processing,\\nComputational Linguistics, and Speech Recognition , volume 2. February 2008.\\nAlec Radford and Karthik Narasimhan. Improving language understanding by generative pre-training. 2018.\\nOpenAI. Openai, 2018. URL https://openai.com/research/language-unsupervised .\\nOpenAI. Gpt-4, a. URL https://openai.com/product/gpt-4 .\\nOpenAI. Openai publishing results refusal, b. URL https://cdn.openai.com/papers/gpt-4.pdf .\\nGoogle. Palm2, a. URL https://ai.google/discover/palm2 .\\nTogether. Gpt-jt. URL https://bit.ly/3TqjFes .\\nMeta. Llama. URL https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ .\\nOpenAI. Instructgpt, c. URL https://openai.com/research/instruction-following .\\nOpenAI. Chatgpt, d. URL https://openai.com/blog/chatgpt .\\nBing. Bing chat. URL https://www.bing.com/new .\\nGoogle. Bard, b. URL https://bard.google.com/ .\\nGoogle. Lamda, c. URL https://blog.google/technology/ai/lamda/ .\\nJasper. Jasper chat. URL https://www.jasper.ai/chat .\\nAnthropic. Claude, a. URL https://www.anthropic.com/index/claudes-constitution .\\nOpenAI DALL-E 2. Dall-e 2. URL https://openai.com/product/dall-e-2 .\\nStable Diffusion. Stable diffusion. URL https://stablediffusionweb.com/ .\\nWritesonic. Photosonic art generator. URL https://writesonic.com/photosonic-ai-art-generator .\\nRunway. Runway. URL https://research.runwayml.com/gen2 .\\nMeta Make-A-Video. Make-a-video. URL https://makeavideo.studio/ .\\nGoogle Imagen. Imagen video. URL https://imagen.research.google/ .\\nRuben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi\\nSaffar, Santiago Castro, Julius Kunze, and Dumitru Erhan. Phenaki. URL https://phenaki.video/ .\\n13', metadata={'source': './pdf/2403.04667.pdf', 'page': 12}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nBloomberg. Bloomberggpt. URL https://www.bloomberg.com/company/press/\\nbloomberggpt-50-billion-parameter-llm-tuned-finance/ .\\nSlack. Slackgpt. URL https://slack.com/intl/en-gb/blog/news/introducing-slack-gpt .\\nMicrosoft. Biogpt. URL https://github.com/microsoft/BioGPT .\\nGitHub. Copilot. URL https://github.com/features/copilot .\\nBruno Cartaxo, Gustavo Pinto, and Sergio Soares. The role of rapid reviews in supporting decision-making in software\\nengineering practice. In Proceedings of the 22nd International Conference on Evaluation and Assessment in Software\\nEngineering 2018 , EASE’18, page 24–34, New York, NY , USA, 2018. Association for Computing Machinery. ISBN\\n9781450364034. doi:10.1145/3210459.3210462. URL https://doi.org/10.1145/3210459.3210462 .\\nBarbara Kitchenham and Stuart Charters. Guidelines for performing systematic literature reviews in software engineer-\\ning. 2, 01 2007.\\nJan Piasecki, Marcin Waligora, and Vilius Dranseika. Google search as an additional source in systematic reviews.\\nScience and Engineering Ethics , 24, 12 2017. doi:10.1007/s11948-017-0010-4.\\nMaria Teresa Baldassarre, Danilo Caivano, Berenice Fernàndez Nieto, Domenico Gigante, and Azzurra Ragone.\\nhttps://figshare.com/s/77c3a667671472f8eccc, 2023. URL https://figshare.com/s/77c3a667671472f8eccc .\\nEuropean Commission. Proposal for a regulation of the european parliament and of the council laying down harmonised\\nrules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts, 2021. URL\\nhttps://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206 .\\nInternational Journal of Human Rights Law Review. ChatGPT and human rights: Nav-\\nigating the technological frontier, 2023. URL https://humanrightlawreview.in/\\nchatgpt-and-human-rights-navigating-the-technological-frontier/ .\\nPartha Pratim Ray. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics,\\nlimitations and future scope. 3:121–154, 2023. ISSN 2667-3452. doi:10.1016/j.iotcps.2023.04.003. URL https:\\n//www.sciencedirect.com/science/article/pii/S266734522300024X .\\nJustin Paul, Akiko Ueno, and Charles Dennis. ChatGPT and consumers: Benefits, pitfalls and future research agenda.\\npage ijcs.12928, 2023. ISSN 1470-6423, 1470-6431. doi:10.1111/ijcs.12928. URL https://onlinelibrary.\\nwiley.com/doi/10.1111/ijcs.12928 .\\nDennis Hillemann and Stephan Zimprich. ChatGPT - legal challenges, legal opportunities, 2023. URL https:\\n//www.fieldfisher.com/en/insights/chatgpt-legal-challenges-legal-opportunities .\\nJessica Davis. ChatGPT: Enterprises eye use cases, ethicists remain con-\\ncerned, 2022. URL https://www.informationweek.com/big-data/\\nchat-gpt-enterprises-eye-use-cases-ethicists-remain-concerned .\\nSamantha Lock. What is AI chatbot phenomenon ChatGPT and could it replace humans?\\n2022. ISSN 0261-3077. URL https://www.theguardian.com/technology/2022/dec/05/\\nwhat-is-ai-chatbot-phenomenon-chatgpt-and-could-it-replace-humans .\\nPriyanka Deo. Is ChatGPT dangerous for humanity? 2023. ISSN 0971-8257. URL https://timesofindia.\\nindiatimes.com/business/india-business/is-chatgpt-dangerous-for-humanity/articleshow/\\n98471105.cms .\\nPablo Rivas and Liang Zhao. Marketing with ChatGPT: Navigating the ethical terrain of GPT-based chatbot technology.\\n4(2):375–384, 2023. ISSN 2673-2688. doi:10.3390/ai4020019. URL https://www.mdpi.com/2673-2688/4/2/\\n19.\\nDeladem Kumordzie. All you need to know about ChatGPT & why its\\na threat to google, 2023. URL https://medium.com/@cdkumordzie/\\nall-you-need-to-know-about-chatgpt-why-its-a-threat-to-google-fd2b887c8ff8 .\\nLevente. The pros and cons dark side of using chat GPT for businesses, 2023. URL https://medium.com/\\n@Levente22/the-pros-and-cons-dark-side-of-using-chat-gpt-for-businesses-cf2373119dab .\\nBernard Marr. What does ChatGPT really mean for your job?, 2023. URL https://www.forbes.com/sites/\\nbernardmarr/2023/02/13/what-does-chatgpt-really-mean-for-your-job/ .', metadata={'source': './pdf/2403.04667.pdf', 'page': 13}),\n",
       " Document(page_content='Bernard Marr. What does ChatGPT really mean for your job?, 2023. URL https://www.forbes.com/sites/\\nbernardmarr/2023/02/13/what-does-chatgpt-really-mean-for-your-job/ .\\nMalak Abdullah, Alia Madain, and Yaser Jararweh. ChatGPT: Fundamentals, applications and social impacts. In 2022\\nNinth International Conference on Social Networks Analysis, Management and Security (SNAMS) , pages 1–8, 2022.\\ndoi:10.1109/SNAMS58071.2022.10062688. ISSN: 2831-7343.\\n14', metadata={'source': './pdf/2403.04667.pdf', 'page': 13}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nSunder Ali Khowaja, Parus Khuwaja, and Kapal Dev. ChatGPT needs SPADE (sustainability, PrivAcy, digital divide,\\nand ethics) evaluation: A review, 2023. URL http://arxiv.org/abs/2305.03123 .\\nBulbul Gupta, Tabish Mufti, Shahab Saquib Sohail, and Dag Øivind Madsen. ChatGPT: A brief narrative review. 2023.\\ndoi:10.20944/preprints202304.0158.v1. URL https://www.preprints.org/manuscript/202304.0158/v1 .\\nDerek Bruff. Teaching in the artificial intelligence age of ChatGPT | teaching + learning lab, 2023. URL https:\\n//tll.mit.edu/teaching-in-the-artificial-intelligence-age-of-chatgpt/ .\\nAli Iskender. Holy or unholy? interview with open AI’s ChatGPT. 34:3414–3414, 2023. ISSN 1314-0817.\\ndoi:10.54055/ejtr.v34i.3169. URL https://ejtr.vumk.eu/index.php/about/article/view/3169 .\\nMuhammad Salar Khan and Hamza Umer. Chatgpt in finance: Addressing ethical challenges, 2023. URL https:\\n//papers.ssrn.com/abstract=4439967 .\\nHyunsu Lee. The rise of ChatGPT: Exploring its potential in medical education. page ase.2270, 2023. ISSN 1935-9772,\\n1935-9780. doi:10.1002/ase.2270. URL https://anatomypubs.onlinelibrary.wiley.com/doi/10.1002/\\nase.2270 .\\nGrace H. Sun and Stephanie H. Hoelscher. The ChatGPT storm and what faculty can do. 48(3):119,\\n2023. ISSN 0363-3624. doi:10.1097/NNE.0000000000001390. URL https://journals.lww.com/\\nnurseeducatoronline/Fulltext/2023/05000/The_ChatGPT_Storm_and_What_Faculty_Can_Do.1.\\naspx?context=FeaturedArticles&collectionId=5 .\\nAndré Guskow Cardoso. Do we need a chat-GPT-gov? the importance of technology for effective access to public\\ninformation., 2023. URL https://papers.ssrn.com/abstract=4365773 .\\nMuhammad Shidiq. The use of artificial intelligence-based chat-gpt and its challenges for the world of education;\\nfrom the viewpoint of the development of creative writing skills. 1(1):353–357, 2023. ISSN 2986-5832. URL\\nhttps://ejournal.unuja.ac.id/index.php/icesh/article/view/5614 .\\nJeonghyun Lee and Meryem Yilmaz Soylu. ChatGPT and assessment in higher education, 2023. URL\\nhttps://c21u.gatech.edu/sites/default/files/publication/2023/03/C21U%20ChatGPT%20White%\\n20Paper_Final.pdf .\\nDavid Mhlanga. Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning, 2023.\\nURL https://papers.ssrn.com/abstract=4354422 .\\nAram Bahrini, Mohammadsadra Khamoshifar, Hossein Abbasimehr, Robert J. Riggs, Maryam Esmaeili, Rastin Mastali\\nMajdabadkohne, and Morteza Pasehvar. ChatGPT: Applications, opportunities, and threats, 2023. URL http:\\n//arxiv.org/abs/2304.09103 .\\nAbu Muna Almaududi Ausat, Berdinata Massang, Mukhtar Efendi, Nofirman Nofirman, and Yasir Riady. Can chat GPT\\nreplace the role of the teacher in the classroom: A fundamental analysis. 5(4):16100–16106, 2023. ISSN 2654-5497.\\ndoi:10.31004/joe.v5i4.2745. URL https://www.jonedu.org/index.php/joe/article/view/2745 .\\nIrene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-V oss, Jeff Wu, Alec Radford, Gretchen\\nKrueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine\\nWang. Release strategies and the social impacts of language models, 2019. URL http://arxiv.org/abs/1908.\\n09203 .\\nAlla Tiunova and Felipe Muñoz. Chatgpt: Using ai in social studies academic research, 2023. URL https://papers.\\nssrn.com/abstract=4451612 .\\nPaul Geertsema, Albert Bifet, and Richard Green. ChatGPT and large language models: What are the implications for\\npolicy makers?, 2023. URL https://papers.ssrn.com/abstract=4424048 .\\nA.Shaji George, A.S.Hovan George, and A.S.Gabrio Martin. The environmental impact of AI: A case study of water\\nconsumption by chat GPT. 2023. doi:10.5281/ZENODO.7855594. URL https://zenodo.org/record/7855594 .\\nSom S Biswas. Role of chat gpt in public health. Annals of Biomedical Engineering , pages 1–2, 2023a.\\nSam Biddle. The internet’s new favorite AI proposes torturing iranians and surveilling mosques, 2023. URL https:', metadata={'source': './pdf/2403.04667.pdf', 'page': 14}),\n",
       " Document(page_content='Sam Biddle. The internet’s new favorite AI proposes torturing iranians and surveilling mosques, 2023. URL https:\\n//theintercept.com/2022/12/08/openai-chatgpt-ai-bias-ethics/ .\\nDavid C and Paul J. ChatGPT and large language models: what’s the risk?, 2023. URL https://www.ncsc.gov.uk/\\nblog-post/chatgpt-and-large-language-models-whats-the-risk .\\nEquality Now. ChatGPT-4 reinforces sexist stereotypes by stating a girl cannot “handle technicali-\\nties and numbers” in engineering, 2023. URL https://www.equalitynow.org/news_and_insights/\\nchatgpt-4-reinforces-sexist-stereotypes/ .\\n15', metadata={'source': './pdf/2403.04667.pdf', 'page': 14}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nKurt Robson. Do AI chatbots like ChatGPT pose a major cybersecurity risk?, 2023. URL https://www.verdict.\\nco.uk/do-ai-chatbots-like-chatgpt-pose-a-major-cybersecurity-risk/ .\\nBilly Perrigo. Exclusive: The $2 per hour workers who made ChatGPT safer, 2023. URL https://time.com/\\n6247678/openai-chatgpt-kenya-workers/ .\\nZihao Li. The dark side of ChatGPT: Legal and ethical challenges from stochastic parrots and hallucination, 2023.\\nURL http://arxiv.org/abs/2304.14347 .\\nN. Gowri Vidhya, D. Devi, Nithya A, and T. Manju. Prognosis of exploration on chat GPT with artificial\\nintelligence ethics. 2(9):60–69, 2023. ISSN 2764-3417. doi:10.14295/bjs.v2i9.372. URL https://www.\\nbrazilianjournalofscience.com.br/revista/article/view/372 .\\nAlex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. Understanding the capabilities, limitations, and societal\\nimpact of large language models, 2021. URL http://arxiv.org/abs/2102.02503 .\\nChristoph Treude and Hideaki Hata. She elicits requirements and he tests: Software engineering gender bias in large\\nlanguage models, 2023. URL http://arxiv.org/abs/2303.10131 .\\nArun James Thirunavukarasu, Refaat Hassan, Shathar Mahmood, Rohan Sanghera, Kara Barzangi, Mohanned El\\nMukashfi, and Sachin Shah. Trialling a large language model (ChatGPT) in general practice with the applied\\nknowledge test: Observational study demonstrating opportunities and limitations in primary care. 9(1):e46599, 2023.\\ndoi:10.2196/46599. URL https://mededu.jmir.org/2023/1/e46599 .\\nOksana Stepanechko and Liubov Kozub. English teachers’ concerns about the ethical use of chat GPT by university\\nstudents. (25):297–302, 2023. ISSN 2710-3056. doi:10.36074/grail-of-science.17.03.2023.051. URL https:\\n//archive.journal-grail.science/index.php/2710-3056/article/view/1040 .\\nNatali Helberger and Nicholas Diakopoulos. ChatGPT and the AI act. 12(1), 2023. ISSN 2197-6775.\\ndoi:10.14763/2023.1.1682. URL https://policyreview.info/essay/chatgpt-and-ai-act .\\nChristopher Air, Shanaka Wijetunge, and Alexander Dimitrov. The ethics of AI: The cyber risks\\nposed by chat GPT, 2023. URL https://www.dacbeachcroft.com/en/gb/articles/2023/february/\\nthe-ethics-of-ai-the-cyber-risks-posed-by-chat-gpt .\\nZachary B. Wolf. AI can be racist, sexist and creepy. what should we do about it? | CNN politics, 2023. URL\\nhttps://www.cnn.com/2023/03/18/politics/ai-chatgpt-racist-what-matters/index.html .\\nChris Vallance. ChatGPT: New AI chatbot has everyone talking to it. 2022. URL https://www.bbc.com/news/\\ntechnology-63861322 .\\nDavid Rozado. The political biases of ChatGPT. 12(3):148, 2023-03. ISSN 2076-0760. doi:10.3390/socsci12030148.\\nURL https://www.mdpi.com/2076-0760/12/3/148 .\\nInstitute for Human Rights and Business. We asked ChatGPT about its impact on human rights and business. here’s what\\nit told us, 2023. URL https://www.ihrb.org/focus-areas/information-communication-technology/\\nwe-asked-chatgpt-about-its-impact-on-human-rights-on-business-heres-what-it-told-us .\\nSom Biswas. Prospective role of chat GPT in the military: According to ChatGPT. 2023b. ISSN 2632-3834.\\ndoi:10.32388/8WYYOD. URL https://www.qeios.com/read/8WYYOD .\\nZuheir N. Khlaif. Ethical concerns about using AI-generated text in scientific research, 2023. URL https://papers.\\nssrn.com/abstract=4387984 .\\nGarante per la protezione dei dati personali. ChatGPT: OpenAI riapre la piattaforma in italia garantendo più trasparenza\\ne più diritti a utenti e non utenti europei, 2023. URL https://www.garanteprivacy.it:443/home/docweb/-/\\ndocweb-display/docweb/9881490 .\\nNigel Curtis and ChatGPT§. To chatgpt or not to chatgpt? the impact of artificial intelligence on academic publishing.\\n42(4):275, 2023. ISSN 0891-3668. doi:10.1097/INF.0000000000003852. URL https://journals.lww.com/\\npidj/Citation/2023/04000/To_ChatGPT_or_not_to_ChatGPT__The_Impact_of.1.aspx .\\nAlessandro Gabbiadini, Ognibene Dimitri, Cristina Baldissarri, and Anna Manfredi. Does ChatGPT pose a threat to', metadata={'source': './pdf/2403.04667.pdf', 'page': 15}),\n",
       " Document(page_content='pidj/Citation/2023/04000/To_ChatGPT_or_not_to_ChatGPT__The_Impact_of.1.aspx .\\nAlessandro Gabbiadini, Ognibene Dimitri, Cristina Baldissarri, and Anna Manfredi. Does ChatGPT pose a threat to\\nhuman identity?, 2023. URL https://papers.ssrn.com/abstract=4377900 .\\nTech Telegraph. Microsoft, OpenAI, alphabet and big tech are ignoring the human cost behind the rise of ChatGPT and\\nother AI-powered chatbots. 2023. ISSN 0307-1235. URL https://bit.ly/3uXvdwp .\\nChase DiBenedetto. ChatGPT’s surprisingly human voice came with a human cost, 2023. URL https://mashable.\\ncom/article/chat-gpt-open-ai-workers-exploitation .\\n16', metadata={'source': './pdf/2403.04667.pdf', 'page': 15}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nAyman Al Ashry. Chat GPT and its legal impact on society as a new form of AI - copy-\\nright - united arab emirates, 2023. URL https://www.mondaq.com/copyright/1299882/\\nchat-gpt-and-its-legal-impact-on-society-as-a-new-form-of-ai .\\nJascha Bareis. We are scared of the question chat-GPT cannot answer. because the answer is too obvious., 2023. URL\\nhttps://papers.ssrn.com/abstract=4410324 .\\nJérôme Rutinowski, Sven Franke, Jan Endendyk, Ina Dormuth, and Markus Pauly. The self-perception and political\\nbiases of ChatGPT, 2023. URL http://arxiv.org/abs/2304.07333 .\\nFabio Suguri Motoki, Valdemar Pinho Neto, and Victor Rodrigues. More human than human: Measuring ChatGPT\\npolitical bias. 2023. doi:10.2139/ssrn.4372349. URL https://ueaeprints.uea.ac.uk/id/eprint/91668/ .\\nWeeTech Solution. What is ChatGPT and the benefits of using ChatGPT, 2023. URL https://www.\\nweetechsolution.com/blog/what-is-chat-gpt-and-the-advantages-of-using-chat-gpt .\\nTerry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. Exploring AI ethics of ChatGPT: A diagnostic\\nanalysis, 2023. URL http://arxiv.org/abs/2301.12867 .\\nAdam Sobieszek and Tadeusz Price. Playing games with ais: The limits of GPT-3 and similar large language models.\\n32(2):341–364, 2022. ISSN 1572-8641. doi:10.1007/s11023-022-09602-0. URL https://doi.org/10.1007/\\ns11023-022-09602-0 .\\nYihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun. A comprehen-\\nsive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. 2023.\\ndoi:10.48550/ARXIV .2303.04226. URL https://arxiv.org/abs/2303.04226 .\\nJames Clayton. Sam altman: CEO of OpenAI calls for US to regulate artificial intelligence. 2023. URL https:\\n//www.bbc.com/news/world-us-canada-65616866 .\\nAnthropic. Anthropic raises $450 million in series c funding to scale reliable. . . , b. URL https://www.anthropic.\\ncom/index/anthropic-series-c .\\n17', metadata={'source': './pdf/2403.04667.pdf', 'page': 16})]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "1b6a88bb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content='THE SOCIAL IMPACT OF GENERATIVE AI: A NANALYSIS ON\\nCHATGPT\\nA P REPRINT\\nMaria T. Baldassarre\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nmariateresa.baldassarre@uniba.it\\nDanilo Caivano\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndanilo.caivano@uniba.it\\nBerenice Fernández Nieto\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nberenice.fernandeznieto@uniba.it\\nDomenico Gigante\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndomenico.gigante1@uniba.it\\nAzzurra Ragone\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nazzurra.ragone@uniba.it\\nSeptember, 2023\\nABSTRACT\\nIn recent months, the social impact of Artificial Intelligence (AI) has gained considerable public inter-\\nest, driven by the emergence of Generative AI models, ChatGPT in particular. The rapid development\\nof these models has sparked heated discussions regarding their benefits, limitations, and associated\\nrisks. Generative models hold immense promise across multiple domains, such as healthcare, finance,\\nand education, to cite a few, presenting diverse practical applications. Nevertheless, concerns about\\npotential adverse effects have elicited divergent perspectives, ranging from privacy risks to escalating\\nsocial inequality. This paper adopts a methodology to delve into the societal implications of Genera-\\ntive AI tools, focusing primarily on the case of ChatGPT. It evaluates the potential impact on several\\nsocial sectors and illustrates the findings of a comprehensive literature review of both positive and\\nnegative effects, emerging trends, and areas of opportunity of Generative AI models. This analysis\\naims to facilitate an in-depth discussion by providing insights that can inspire policy, regulation, and\\nresponsible development practices to foster a human-centered AI.\\nKeywords AI Social Impact ·ChatGPT Social Impact ·Human-centered AI ·Perceptions on ChatGPT ·AI Social\\nConcern\\n1 Introduction\\nIn recent months, the social impact of Artificial Intelligence (AI) has been at the forefront of public debate due\\nprimarily to the introduction of new software systems and technologies, specifically ChatGPT. The rapid development\\nof these technologies has, even more, sparked the debate regarding the advantages, limitations, and risks of Artificial\\nIntelligence’s expanding capabilities. From healthcare to cybersecurity, generative models offer a vast array of practical\\nand prosperous future possibilities. However, concerns regarding potential adverse effects present an opposing viewpoint,arXiv:2403.04667v1  [cs.AI]  7 Mar 2024', metadata={'source': './pdf/2403.04667.pdf', 'page': 0})"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#这种方法的优点是可以使用页码检索文档。\n",
    "pages[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "155f2534",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_openai import ChatOpenAI, OpenAI\n",
    "openai_api_key = \"EMPTY\"\n",
    "openai_api_base = \"http://127.0.0.1:1234/v1\"\n",
    "model = ChatOpenAI(\n",
    "    openai_api_key=openai_api_key,\n",
    "    openai_api_base=openai_api_base,\n",
    "    temperature=0.3,\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "dfec83e2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'THE SOCIAL IMPACT OF GENERATIVE AI: A NANALYSIS ON\\nCHATGPT\\nA P REPRINT\\nMaria T. Baldassarre\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nmariateresa.baldassarre@uniba.it\\nDanilo Caivano\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndanilo.caivano@uniba.it\\nBerenice Fernández Nieto\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nberenice.fernandeznieto@uniba.it\\nDomenico Gigante\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndomenico.gigante1@uniba.it\\nAzzurra Ragone\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nazzurra.ragone@uniba.it\\nSeptember, 2023\\nABSTRACT\\nIn recent months, the social impact of Artificial Intelligence (AI) has gained considerable public inter-\\nest, driven by the emergence of Generative AI models, ChatGPT in particular. The rapid development\\nof these models has sparked heated discussions regarding their benefits, limitations, and associated\\nrisks. Generative models hold immense promise across multiple domains, such as healthcare, finance,\\nand education, to cite a few, presenting diverse practical applications. Nevertheless, concerns about\\npotential adverse effects have elicited divergent perspectives, ranging from privacy risks to escalating\\nsocial inequality. This paper adopts a methodology to delve into the societal implications of Genera-\\ntive AI tools, focusing primarily on the case of ChatGPT. It evaluates the potential impact on several\\nsocial sectors and illustrates the findings of a comprehensive literature review of both positive and\\nnegative effects, emerging trends, and areas of opportunity of Generative AI models. This analysis\\naims to facilitate an in-depth discussion by providing insights that can inspire policy, regulation, and\\nresponsible development practices to foster a human-centered AI.\\nKeywords AI Social Impact ·ChatGPT Social Impact ·Human-centered AI ·Perceptions on ChatGPT ·AI Social\\nConcern\\n1 Introduction\\nIn recent months, the social impact of Artificial Intelligence (AI) has been at the forefront of public debate due\\nprimarily to the introduction of new software systems and technologies, specifically ChatGPT. The rapid development\\nof these technologies has, even more, sparked the debate regarding the advantages, limitations, and risks of Artificial\\nIntelligence’s expanding capabilities. From healthcare to cybersecurity, generative models offer a vast array of practical\\nand prosperous future possibilities. However, concerns regarding potential adverse effects present an opposing viewpoint,arXiv:2403.04667v1  [cs.AI]  7 Mar 2024The Social Impact of Generative AI A P REPRINT\\nwith arguments spanning from privacy risks to deepening social inequalities. Models like ChatGPT personalize the\\ndigital version of the Delphic oracle, where people expect to find answers to their current problems by automating\\ntasks, seeking ChatGPT’s opinion on various issues, and even requesting advice. Nonetheless, it is essential to question\\nwhether we are genuinely resolving uncertainties or uncovering new ones regarding the scope, boundaries, and prospects\\nof generative models’ societal impact. Some other analysts have referred to an \"AI arms race\" in which companies\\nworldwide strive to showcase the best technology, innovation prowess, and leadership in the AI market. On the\\nother side of the debate, discussions refer to the rapid development of these models and the effectiveness of existing\\nlegal frameworks in safeguarding against unintended adverse outcomes. Amongst all these debates, unquestionably,\\nGenerative AI is currently undergoing a period of accelerated evolution. This evolution inevitably brings about a social\\nimpact akin to the ones experienced through numerous other technological advancements that transformed our society\\nin the past. To date, significant effects have been observed in service provision, education, and scientific analysis.\\nHowever, more profound and concerning impacts also unfold in domains like democracy, inequality, security, and\\nmilitary technology.\\nConsequently, a comprehensive examination and analysis are required to understand the positive and negative social\\nconsequences, emerging trends, and areas of improvement of generative models. These studies are needed to address\\npotential vulnerabilities and ensure the development of these technologies considers the diverse social contexts and\\nrealities in which they are deployed.\\nBuilding upon the preceding insights, this analysis adopts a comprehensive approach to explore the societal ramifications\\nand future trajectories of Generative AI, with a specific emphasis on ChatGPT. The analysis is organized as follows:\\nSection 2 examines the potential impacts of ChatGPT across diverse social sectors and the evolution of the debate\\nacross various spheres; Section 3 provides a brief overview of the state-of-the-art of Generative AI models as well\\nas a classification of these; Section 4 presents the study design, goals, and Research Questions, plus the search\\nstrategy. Section 5 includes the data analysis and synthesis, drawing conclusions from the literature review, and sharing\\nvisualizations of our findings. Conclusion and future work close the paper.\\n2 BACKGROUND\\nThroughout history, the advancement of technology has consistently brought about significant transformations in social\\ndynamics. Each new technological breakthrough has sparked debates regarding scientific progress’ advantages and\\npotential hazards. Currently, this discourse encompasses various automated tools, data collection and analysis, and the\\ndigitization of services, among other emerging applications, which have become integral parts of modern-day life.\\nThese novel technological applications pervade numerous societal domains, ranging from education to diplomacy,\\nexerting a profound influence and continuously reshaping various social processes. While there is a prevailing and\\noptimistic belief in the positive impact of technology on human progress [McKendrick, 2019], it is also becoming\\nincreasingly apparent that these disruptive forces could potentially engender unforeseen and unintended consequences.\\nIn informatics, sociology, philosophy, and politics, the development of generative models will continue to ignite\\nin-depth discussions on various subjects. These discussions include topics such as regulation, risk mitigation, liability,\\ntransparency, and accountability, as well as the effects on socialization patterns and the trajectory of technological\\ndevelopment itself.transparency, and accountability, as well as the effects on socialization patterns and the trajectory of technological\\ndevelopment itself.\\nIn our case, the significance of examining the social impact of ChatGPT stems from its potential to cause significant\\nsocial transformations, despite ongoing debates regarding the magnitude of these changes. ChatGPT is a powerful\\ngenerative model that may impact power dynamics at multiple scales, from individual interactions to broader social\\nstructures [Farrell et al., 2022]. This dynamic occurs in complex social environments marked by disparities, stereotypes,\\nconflicts, and various political and social organization forms. These diverse social contexts, which surround scientific\\nadvances, trigger unpredictable and immeasurable consequences that fundamentally transform how we interact with\\neach other and the world. When a disruptive force permeates a new environment, it encounters various forms of\\nresistance. It triggers unintended negative consequences, demands protection from potential vulnerabilities, causes\\nuncertainty about whether it can be regulated, and other related factors. This critical-resistant front contrasts with\\na skeptical perception that questions the gravity of this new disruption and the alarmist interpretations surrounding\\nemerging technologies, in this case. A third front embodies an optimistic outlook, emphasizing the manifold benefits\\nacross multiple sectors, fostering enthusiasm for potential advancements and enhancements, and envisioning their\\npotential to address significant challenges. Overall, the interaction between these forms of resistance shapes perceptions\\nof generative models’ societal impact and raises critical questions about their implications for the broader social fabric.\\n2The Social Impact of Generative AI A P REPRINT\\nThese perspectives, as human behavior changes over time, interact and mutually influence and foster one another. In the\\ncase of ChatGPT, we also observe these attitudes, including optimism, pessimism, and skepticism, but the panorama is\\neven more complex as we present below in our literature review.\\nAs evidence of the increasing interest in ChatGPT and AI, Figure 1 depicts the fluctuations in Google search queries for\\n\"ChatGPT\" globally from May 2020 (the release of ChatGPT) until May 2023 (which includes the time frame of our\\nresearch).\\nFigure 1: Search queries evolution on “ChatGPT” from May 2020 to May 2023 via Google Trends ( https://trends.\\ngoogle.it/trends/explore?date=2020-04-28%202023-05-12&q=Chat%20GPT3&hl=en ). The \"note\" in the\\ngraph reflects an improvement in Google’s data collection system implemented on 1 January 22.\\nFig.1, furthermore, depicts a consistent low level of interest until December 2022, when a significant shift occurred,\\ncoinciding with the months following the launch of Open AI’s ChatGPT as a prototype service on 30 November 2022\\n[Gordon, 2022], which attracted global attention. In April and May of 2023, interest peaked.\\nAs we enter the third wave of AI evolution, examining how socialization processes change and assessing scientific\\ninnovations’ potential positive or negative effects on rights and freedoms is critical. Following Pinch and Bijker [Pinch\\nand Bijker, 1984], it is essential to examine the construction of scientific knowledge across different localities and\\ncontexts. As a result, the primary goal of this analysis is to assess the evolution of ChatGPT in recent years and to\\nexamine its current perceived impact on various social aspects, in the context of the ongoing wave of AI evolution.\\n3 STATE OF THE ART\\nGenerative pre-training (GP) was a well-known concept in machine learning applications since 2012 [noa, 2012].\\nLater, in 2017 Google introduced the transformer architecture [Vaswani et al., 2017]. These advancements led to the\\nbirth of large language models like BERT in 2018 [Devlin et al., 2019] and XLNet in 2019 [Yang et al., 2019]: these are\\npre-trained transformers (PT) but are not designed to be generative. A language model is a probability distribution over\\nsequences of words [Jurafsky and Martin, 2008]: given any sequence of words of length m, a language model assigns a\\nprobability to the whole sequence. A Large Language Model (LLM) is a language model based on a neural network\\nwith many parameters (typically billions or more). Prior to transformer-based architectures, the best-performing neural\\nNatural Language Processing (NLP) models employed supervised learning from large amounts of manually-labeled\\ndata.\\nThe main drawbacks of using supervised learning are the impossibility to use it on not well-annotated datasets, and\\nalso the prohibitive cost and time required to train extremely large language models [Radford and Narasimhan, 2018].\\nUsually, LLMs trained on a large quantity of data can perform discretely a good number of tasks; anyway, they can be\\nfine-tuned (i.e., further trained on specific data) to execute a specific task with better performance.\\nLater, in 2018 OpenAI [OpenAI, 2018] published its famous article \" Improving Language Understanding by Generative\\nPre-Training \", in which the first Generative Pre-trained Transformer (GPT) system was introduced [Radford and\\nNarasimhan, 2018]. GPT is a type of large language model (LLM) used mainly for Generative AI, which is a type of AI\\ncapable of generating various kinds of content, such as text and images, in response to instructions (also known as\\nprompts). Generative AI models learn the patterns and structure of their input training data and then generate new data\\nthat has similar characteristics, according to what has been asked as a prompt.\\n3The Social Impact of Generative AI A P REPRINT\\n3.1 Foundational models\\nA foundational model is an AI model trained on broad data at scale such that it can be adapted to a wide range of\\ndownstream tasks. The most famous and performant GPT foundation models are the ones released by OpenAI. The\\nmost recent is GPT-4 [OpenAI, a], for which OpenAI refused to publish the size and training details due to business\\nreasons [OpenAI, b].\\nOther such models include Google’s PaLM [Google, a] and Together’s GPT-JT [Together], which has been reported as\\nthe closest-performing open-source alternative to GPT-3. Meta as well has released a generative foundational language\\nmodel, named LLaMA [Meta]. Foundational GPTs can also handle media (and not only text), both for input and/or\\noutput. For example, GPT-4 is capable of processing text and images as input but only produces text as output.\\n3.2 Task-Specific Models\\nA foundational GPT model is usually further trained to better perform specific tasks and/or handle subject-matter\\ndomains. One of the most used methods for such adaptation is fine-tuning (beyond that done for the foundation model).\\nFine-tuning is an approach in which the weights of a pre-trained (language) model are trained on new data. One example\\nof this is fine-tuning LLM to comprehend and follow instructions: in January 2022, OpenAI introduced InstructGPT\\n[OpenAI, c] –a series of models which were fine-tuned to follow instructions. The gained advantages included higher\\naccuracy, less negative/toxic sentiment, and generally better alignment with user needs. Other examples of task-specific\\nmodels are chatbots, AI systems that engage in human-like conversation; ChatGPT [OpenAI, d] is currently the most\\nfamous chatbot. Anyway, other major chatbots currently exist, such as Microsoft’s Bing Chat [Bing] – which uses\\nOpenAI’s GPT-4 [OpenAI, a] (as part of a broader close collaboration between OpenAI and Microsoft) –, Google’s\\ncompeting chatbot Bard [Google, b] and LaMDA [Google, c], Jasper Chat [Jasper], Claude [Anthropic, a].\\nFinally, the text-to-model task is becoming quite popular. To date, some famous models are Dall-E 2 [2],Stable\\nDiffusion [Diffusion], PhotoSonic Art Generator [Writesonic] whose task is the production of images based on\\nuser-provided textual prompts. Following the text-to-image models, also the text-to-video task has been addressed\\nwith a lot of models, such as: Runway [Runway], Meta’s Make-A-Video [Make-A-Video], Google’s Imagen Video\\n[Imagen] and Phenaki [Villegas et al.]. All these models can generate video from text and/or text/image prompts.\\n3.3 Domain-specific models\\nGPT systems can be re-trained to address particular fields or domains. Examples of such models (and apps) are\\nBloombergGPT [Bloomberg] for the financial domain, which should provide help with financial news and information,\\nSlackGPT [Slack] to support the Slack instant-messaging service by providing help and guidance with navigating and\\nsummarizing discussions (based on OpenAI’s API), BioGPT [Microsoft] for the biomedical domain, to provide help\\nwith biomedical literature text generation and mining, CoPilot [GitHub] for the IT source code development domain, to\\nprovide auto-completion capabilities for developers.\\nSometimes domain-specificity is realized via software components, specifically named plug-ins or add-ons. For\\nexample, Google Workspace has available add-ons such as GPT for Sheets and Docs – which is reported to aid the use\\nof spreadsheet functionality in Google Sheets.\\n4 STUDY DESIGN\\nTo perform this review, we followed the protocol proposed in [Cartaxo et al., 2018], and we completed the review\\nprocess with the strategies presented in [Kitchenham and Charters, 2007] for performing systematic literature reviews.\\nThe following subsections describe in detail the study design and its execution. The literature review presented in this\\nwork was carried out through the following steps:The following subsections describe in detail the study design and its execution. The literature review presented in this\\nwork was carried out through the following steps:\\n1.Goal and Research questions : the goal and the correlated research questions were identified to guide the\\nliterature review;\\n2.Search strategy : defining the strategy to collect previous works published in the literature, including research\\ndatabases and query strings;\\n3.Eligibility criteria definition : the criteria used to filter the collected studies have been defined;\\n4.Data extraction : defining how relevant data were extracted to help answer the research questions;\\n5.Data analysis and synthesis : defining how to organize extracted relevant data to answer the research questions.\\n4The Social Impact of Generative AI A P REPRINT\\nFig. 2 summarizes the review protocol.\\nFigure 2: Research protocol used in the literature review\\n4.1 Goal and Research Question Definition\\nWe formulated the following research questions to analyze the diverse dimensions of ChatGPT’s impact.\\nBased on this goal, we defined the following research questions:\\n•RQ1 : What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\n•RQ2 : What are the emerging trends perceived in ChatGPT development?\\n•RQ3 : Which areas of improvement can be identified in the development of such technologies?\\n4.2 Search strategy\\nThe research has been split into two distinct parts: part one , focused on grey literature, and part two , focused on white\\nliterature. Including both grey literature and academic contributions allowed us to conduct a deeper exploration of the\\nimpact of ChatGPT in various settings and from various perspectives.\\nIn part one – started on 29th November 2023 – we focused on grey-literature sources, like blog posts and news articles\\nfrom multiple domains – such as business, education, technology, and society – emphasizing ChatGPT. Here our goal\\nwas to feel the sentiment of the media and tech sphere and capture feelings on ChatGPT that cannot emerge from\\nwhite-literature sources. Furthermore, some pilot searches on white-literature sources produced very few results, which\\ndid not allow us to derive reliable and consistent conclusions.\\nThe string used for part one was:\\n(“ChatGPT” AND “social concerns”) (“ChatGPT” AND “social impact”) (“ChatGPT” AND “Human rights”) (“ChatGPT”\\nAND “society*”) (“ChatGPT” AND “education*”) (“ChatGPT” AND “ethics”)\\nThis string was used to perform a keyword-based search on Google search engine1; the search was performed in\\nprivate-browsing mode, after logging out from personal accounts and erasing all web cookies and history [Piasecki\\net al., 2017].\\nIn the end, this search resulted in 1230 literature sources.\\nWhile executing part one of the research, we continued executing periodic pilot searches for white-literature sources.\\nIn late February 2023, we noticed a notable change: the number of scientific articles addressing ChatGPT increased\\nsignificantly, encompassing diverse approaches and perspectives. This may be due to the fact that academic papers need\\nmore time to be reviewed by peers and published (w.r.t. blog posts and news articles).\\nSo, for part two of the research – started on 22nd February 2023 – we decided to use Google Scholar2as white-literature\\nsearch engine; here we searched for scientific articles of various fields – such as business, education, technology, society,\\nhealthcare. The string used for part two of the research was the following:\\n1https://www.google.it/\\n2https://scholar.google.com/\\n5The Social Impact of Generative AI A P REPRINT\\n(\"Large Language Model\" AND \"Social impact\") (\"Large Language Model\" AND Human Rights\") (\"Large Language\\nModel\" OR \"ChatGPT\" AND \"Ethics\")(\"Large Language Model\" AND \"ChatGPT\" AND \"Ethics\") (\"Large Language\\nModel\" OR \"ChatGPT\" AND \"Social concerns\") (\"Large Language Model” AND \"ChatGPT\" AND \"society*\") (\"Large\\nLanguage Model\" AND \"ChatGPT\" AND \"education*\") (\"Chat GPT\" AND \"social concerns\") (\"ChatGPT\" AND \"social\\nimpact\") (\"ChatGPT\" AND \"Human rights\") (\"ChatGPT\" AND \"society*\") (\"ChatGPT\" AND \"education*\") (\"ChatGPT\"\\nAND \"ethics\")\\nIn the end, this search resulted in 86 new literature sources.\\nAll the documents obtained with this search strategy (both in part one and part two) were surveyed using a 3-stages\\ninformation classification process. In the first stage, only the title and keywords of the collected articles were read. In\\nthe second stage, we analyzed the abstract of each article while in the third stage we read the complete article. All these\\nstages were conducted separately and in blind-view way by two of the authors. In case of a disagreement, a third author\\nmanually verified and took the final decision.\\nAll found publications were subjected to the selection criteria outlined in Sec. 4.3 to determine their relevance for\\ninclusion in the analysis.\\n4.3 Eligibility Criteria Definition and Data Extraction\\nThe selection procedure used for filtering the identified pool of 1316 papers was based on the following criteria:\\n• for every text the content should be mainly related to ChatGPT and LLM,\\n• the content should be written in English.\\nThen, the following criteria helped us to ensure we only included publications providing substantial information to our\\nanalysis, especially on ChatGPT, while we excluded papers with only brief mentions or tangential references:\\n•Forblog posts , we required the author’s name to be consistently provided, and the blogs should be specialized\\nin the relevant subject matter.\\n•Regarding news articles , we preferred those that offered an extensive analysis. We adopted this criterion\\nbecause initial news coverage of ChatGPT tended to be repetitive, often focusing on its capabilities and\\nlimitations and just providing a brief history of the model.\\n•In the case of academic articles , we sought diverse approaches to ensure a comprehensive perspective rather\\nthan lean solely on a single field, such as, for instance, the impact of ChatGPT in education.\\nAfter applying these selection criteria, we selected a total of 71papers from our initial pool of 1316 articles.\\nIn the Data Extraction step, we extracted all relevant data that could help answer any of the research questions. The\\nextraction process was performed by two of the authors and conflicts were solved by a third author in a blind-view way.\\nWe used Atlas.ti3to tabulate and organize data. More detailed information regarding the data and how it was indexed\\ncan be found in the online appendix [Baldassarre et al., 2023].\\n5 DATA ANALYSIS AND SYNTHESIS\\nPart one of the search has been conducted from 29th November 2022 to 22nd February 2023. Part two has been\\nconducted from 22nd February 2023 to 19th May 2023. Table 1 details the results obtained in both parts, as well as the\\ndocuments selected once our selection criteria were applied.\\nResearch phase Resources\\nretrievedResources\\nanalyzedResources\\nselected\\nFirst part, until Feb. 22 on Google 1230 300 25\\nSecond part, until May 19 on Google scholar 86 63 46\\nTable 1: Amount of documents collected grouped by research phase.\\n3https://atlasti.com/\\n6The Social Impact of Generative AI A P REPRINT\\nIn order to answer RQ1, RQ2 and RQ3 we performed an analysis using Atlas.ti. Atlas.ti is a qualitative research tool\\nthat enables the systematic organization of documentary resources by using codes and creating documentary categories.\\nAtlas.ti allowed us to visualize and intuitively present content trends within the analyzed documents. We employed this\\ntool to analyze the selected papers, which were organized in Atlas.ti \"document groups\" following the categories in\\nTable 2.\\nCodes for text analysis in Atlas.ti\\nPositive Impact Negative Impact Emerging trends Areas for improvement\\nBenefits to education Disinformation risk Impact on the tech/AI\\nmarketNeed for appropriate\\nregulation\\nBenefits to customer\\nserviceNegative impact on\\nfreedom of expressionCopyright uncertainty GDPR compliance\\nconcerns\\nBenefits of responses in\\nreal-timeBias concerns Uncertainty over liability\\nfor production failuresUncertainty of\\nclassification under the\\nAI Act\\n... ... ...\\nTable 2: Codes for text analysis in Atlas.ti\\nThe categories and codes presented in Table 2 are derived from an in-depth analysis of the selected papers. The codes\\nwere primarily developed \"in vivo,\" meaning they emerged as potential units of analysis during the reading and analysis\\nprocess. For example, when repetitive references were made to the potential positive impacts of ChatGPT in education,\\nwe created the code \" Benefits for education \". After reviewing and analyzing the documents, the codes were organized\\ninto four categories, which helped to address RQ1, RQ2, and RQ3. The criteria for categorization emerged from\\na thoughtful reflection on the codes and their contextual relevance. In the category of positive impacts, codes such\\nas \"24/7 Availability \" and \" Personalized feedback \" are included since they are frequently mentioned as strengths of\\nChatGPT. On the other hand, the category of negative impacts encompasses codes such as \" Bias concerns \", which\\nemerged as a recurring argument when discussing the potentially detrimental effects of this model. Other codes like\\n\"Privacy concern \" and \" Water footprint \" were identified, further emphasizing the importance of addressing these issues\\nin the context of ChatGPT’s implementation.\\nIn the category of emerging trends , we captured the unforeseen consequences, which, although potentially negative,\\narise unexpectedly from the evolution of the model itself. Examples include \" Copyright uncertainty \" and the \" Need\\nto clarify private sector liability \", which pose challenges that trigger transformations in particular domains such as\\nthe \" Impact on the tech/AI market \". This category also encompasses unexpected challenges to the AI Act [European\\nCommission, 2021] and its coverage of generative models. Another aspect within this category is \" Unintentional\\nmisinformation \", referring to instances where the chat model provides unintentionally inaccurate information, a matter\\ncurrently under scrutiny by various experts. Additionally, the category encompasses codes like \" Skepticism about its\\nactual impact \". These emerging trends shed light on the complex issues ChatGPT presents.\\nThe final category encompasses areas of improvement , focusing on aspects that require further development to address\\nall the adverse and unexpected effects of the model. For instance, the tag \" Negative outcomes mitigation \" highlights\\nthe need for more efforts to minimize adverse consequences. The categorization of codes as \" Uncertainty in data\\ngovernance \" also responds to the criteria as an area of improvement rather than a negative impact. The above-mentioned\\ndecision was made considering that current legal frameworks do not adequately account for models like GPT, thus\\nhighlighting the need for specific measures to address these cases, which will likely be developed in the coming years.\\nThis category also encompasses improvement areas, such as \" Limited up-to-date information \" and \" Limited MedicalThis category also encompasses improvement areas, such as \" Limited up-to-date information \" and \" Limited Medical\\nterminology \", emphasizing the potential for enhancements.\\nAfter establishing the codes and categories, we visually represented their distribution across the 71 documents. The\\nresulting tree map in Atlas.ti, depicted in Figure 3, displays the frequency of the most repeated codes.\\n7The Social Impact of Generative AI A P REPRINT\\nFigure 3: Treemap- Atlas.ti with codes distribution\\nTable 3 also shows the frequency of codes, highlighting the least and most recurrent codes within the Atlas.ti analysis.\\nMost recurrent # Least recurrent #\\nBias concerns 39 Unpredictability risk 1\\nDisinformation risk 25 Prone to injection attacks 1\\nBenefits for education 22 Over-regulation risk 1\\nPrivacy concern 21 Opportunity to increase renewable energy use 1\\nNeed for appropriate regulation 20 Need for using Renewable Energy Sources 1\\nDiscrimination risk 18 Need for human rights safeguards 1\\nUnfairness in the data use in the model 18 Need for explainability and traceability 1\\nBenefits for customer service 17 Need for Accessibility and Affordability 1\\nInaccurate answers 17 Limited Medical terminology 1\\nBenefits for content creation 15 Lead people into extremist positions risk 1\\nTable 3: Most and least recurrent codes; each category is associated with its number of occurrences.\\nIn addition, Fig.4 displays a Sankey diagram that graphically depicts the distribution of code categories across the\\nanalyzed documents (in document groups-unit). The diagram shows that scientific papers, blog posts, conference\\nsymposiums, and other types of publications encompass all coding groups (negative and positive impacts, areas of\\nimprovement, and emerging trends). See Table A2 in online appendix [Baldassarre et al., 2023] for a description\\nof document categories. The articles category includes just emerging trends, areas for improvement, and negative\\nperceptions. Most document categories, including scientific papers, columns, analyses, editorials, and articles, exhibit a\\nnegative tendency. Notably, positive perceptions are more prevalent in blog-posts, conference and symposium papers,\\nand to a lesser degree in news articles. \" Bias concern \" and \" Disinformation risk \" are two of the most common codes\\ncontributing to negative perceptions. In contrast, \" Benefits for customer service \" and \" Multidisciplinary benefits \" are the\\nmost prevalent codes in the documents with a positive trend. Table A3 in online appendix [Baldassarre et al., 2023]\\ndetails tendencies and code frequency across all document categories.\\n8The Social Impact of Generative AI A P REPRINT\\nFigure 4: Sankey diagram illustrating the distribution of code groups across document groups\\n6 Discussion\\nRQ1. What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\nThroughout our literature review, we identified several positive and negative impacts attributed to ChatGPT. Noteworthy\\nbenefits include: the potential for enhancing customer service; multiple papers emphasize the positive impact of\\nChatGPT in this domain [International Journal of Human Rights Law Review, 2023, Ray, 2023, Paul et al., 2023,\\nHillemann and Zimprich, 2023, Davis, 2022, Lock, 2022, Deo, 2023, Rivas and Zhao, 2023, Kumordzie, 2023, Levente,\\n2023, Marr, 2023, Abdullah et al., 2022, Khowaja et al., 2023, Gupta et al., 2023, Bruff, 2023, Iskender, 2023]. The\\nmodel is highlighted as an enabler of cross-cultural dialogue, facilitating communication between individuals from\\ndifferent cultural backgrounds [International Journal of Human Rights Law Review, 2023]. Moreover, ChatGPT offers\\nthe advantage of automating repetitive tasks , freeing time for more complex and value-added activities [Davis, 2022,\\nLevente, 2023, Khan and Umer, 2023, Gupta et al., 2023]. These benefits extend to various sectors, including business\\nand healthcare [Deo, 2023]. Another key advantage is its availability around the clock . This 24/7 accessibility proves\\nvaluable in commercial, healthcare, and educational contexts [Paul et al., 2023, Levente, 2023, Lee, 2023, Sun and\\nHoelscher, 2023, Cardoso, 2023]. The model’s continuous availability ensures timely assistance and support, covering\\nusers’ diverse needs.\\nChatGPT also demonstrates significant potential in education , offering various advantages [Marr, 2023, Shidiq, 2023,\\nLee and Yilmaz Soylu, 2023, Sun and Hoelscher, 2023, Mhlanga, 2023, Rivas and Zhao, 2023, Abdullah et al., 2022,\\nBahrini et al., 2023, Ray, 2023, Ausat et al., 2023, Solaiman et al., 2019, Lee and Yilmaz Soylu, 2023, Tiunova and\\nMuñoz, 2023, Iskender, 2023, Bruff, 2023, Geertsema et al., 2023]. Despite concerns about plagiarism and academic\\nintegrity, the model’s integration can enhance teaching practices in several ways. It can, for example, automates\\ncurriculum creation, enabling educators to save time and streamline the process [Marr, 2023, Bahrini et al., 2023].\\nMoreover, it facilitates the development of innovative educational content, fostering an engaging learning environment\\n[Lee, 2023, Shidiq, 2023]. Additionally, the model serves as a personalized study support assistant, providing tailored\\nguidance and assistance to individual learners [Ray, 2023, Rivas and Zhao, 2023, Sun and Hoelscher, 2023, Mhlanga,\\n2023, Abdullah et al., 2022, Ausat et al., 2023]. In this perspective, a vision emphasizes the need for controlled\\nintegration and adherence to academic guidelines to ensure responsible and ethical use of generative AI models in\\neducation [Sun and Hoelscher, 2023]. By establishing appropriate regulations and ethical frameworks, the educational\\nbenefits of ChatGPT can be maximized while addressing concerns related to plagiarism and promoting an enriching\\nlearning experience.\\nIn the medical field, the model has shown promise in research, data analysis, and telemedicine applications, contributing\\nto advancements in healthcare [Bahrini et al., 2023]. Furthermore, diverse papers recognize its potential contribution to\\n9The Social Impact of Generative AI A P REPRINT\\naddressing environmental challenges . For instance, it may help find innovative solutions to reduce water consumption\\nin the AI industry, highlighting its role in promoting sustainability [George et al., 2023]. Similarly, ChatGPT is also\\nappreciated for its potential as an informative and accountability tool within institutions [Ray, 2023, Cardoso, 2023,\\nBiswas, 2023a]. Lastly, it positively impacts journalism, where it can help with content creation, fact-checking, and\\ngenerating engaging narratives [Davis, 2022, Marr, 2023, Bahrini et al., 2023].\\nConversely, the most recurrent social concern is bias. Several papers [International Journal of Human Rights Law\\nReview, 2023, Ray, 2023, Paul et al., 2023, Lee, 2023, Shidiq, 2023, Biddle, 2023, C and J, 2023, Equality Now,\\n2023, Robson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023a, Vidhya et al., 2023, Khan and Umer, 2023, Ausat et al.,\\n2023, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Treude and Hata, 2023, Farrell et al., 2022, Tiunova\\nand Muñoz, 2023, Thirunavukarasu et al., 2023, Stepanechko and Kozub, 2023, Geertsema et al., 2023] discuss the\\nrisk of deepening existing biases and how ChatGPT can include sexist and racist views due to the characteristics of\\nthe data used during its training. In this sense, there is a special concern about using these tools in various sectors,\\nincluding finance [Khan and Umer, 2023] and other social activities, which can replicate and deepen structural and\\nhistorical inequalities. Our analysis also reveals another perceived negative impact, which is its potential to generate\\nfalse information and facilitate the spread of disinformation [Ray, 2023, Paul et al., 2023, Hillemann and Zimprich,\\n2023, Davis, 2022, Lock, 2022, Bahrini et al., 2023, Equality Now, 2023, Robson, 2023, Li, 2023, Biswas, 2023a,\\nVidhya et al., 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Wolf, 2023, Vallance,\\n2022, Rozado, 2023-03, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Farrell et al., 2022, Tiunova and\\nMuñoz, 2023, Khowaja et al., 2023]. This misuse of technology directly affects rights related to access to accurate\\ninformation and freedom of expression, as well as democratic stability. This phenomenon is particularly relevant as,\\nalthough remarkable in artificial intelligence, the advancements and improvements in generative models have raised\\nconcerns and uncertainties regarding their impact on democratic processes . There is a special concern about the\\nease with which false information can be generated and disseminated, especially in critical contexts such as elections,\\nreferendums, political instability, war conflicts, or under dictatorial regimes. Furthermore, this potential negative impact\\nincludes the digital public sphere, where spreading hate speech on social networks [Institute for Human Rights and\\nBusiness, 2023] can lead to social fragmentation and bolster the manipulation of democratic institutions and social\\ncontrol. False information can be weaponized across various domains, from the stock market to information warfare\\nand propaganda.\\nIn the same vein, another relevant concern is privacy [International Journal of Human Rights Law Review, 2023, Ray,\\n2023, Paul et al., 2023, Deo, 2023, Lee, 2023, Bahrini et al., 2023, Mhlanga, 2023, C and J, 2023, Biswas, 2023a,\\nRobson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023b, Vidhya et al., 2023, Khan and Umer, 2023, Helberger and\\nDiakopoulos, 2023, Vallance, 2022, Khlaif, 2023, Paul et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nIskender, 2023]. Different aspects contribute to this concern, including the coverage of these models under AI Act\\nregulations, its extensive use of user data (particularly minors), the potential for surveillance applications, the data\\nvulnerability to cyber attacks targeting ChatGPT, and the privacy implications when integrating the model into variousvulnerability to cyber attacks targeting ChatGPT, and the privacy implications when integrating the model into various\\ndomains such as education and the military. Privacy is a growing concern as illustrated by the case of the Italian state\\nban. In this case, the Italian authorities asked OpenAI to “expand its privacy policy for users and made it also accessible\\nfrom the sign-up page prior to registration with the service” [Garante per la protezione dei dati personali, 2023] in order\\nto operate in Italy. This case highlights the urgency for ensuring responsible and transparent use of the technology.\\nAnother important negative impact is the risk of job loss , as automation and AI capabilities advance [Davis, 2022, Deo,\\n2023, Rivas and Zhao, 2023, Lee and Yilmaz Soylu, 2023, Biddle, 2023, Robson, 2023, Khan and Umer, 2023, Air\\net al., 2023, Wolf, 2023, Curtis and ChatGPT§, 2023, Gabbiadini et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023].\\nFurthermore, other papers show apprehension about over-regulation [Helberger and Diakopoulos, 2023], highlighting\\nthe needed balance between ensuring ethical and responsible use of AI technologies while avoiding stifling innovation.\\nAdditionally, the model’s own cybersecurity is listed as a negative impact [Levente, 2023, Bahrini et al., 2023, C\\nand J, 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Khowaja et al., 2023].\\nThe comprehensive list of negative perceptions on ChatGPT impacts can be found in Table A1 in online appendix\\n[Baldassarre et al., 2023], providing further insights into the concerns found in the literature.\\nRQ2. What are the emerging trends perceived in ChatGPT development?\\nAn essential part of our review shows emerging trends [Hillemann and Zimprich, 2023, Telegraph, 2023, DiBenedetto,\\n2023, Deo, 2023, Equality Now, 2023, Institute for Human Rights and Business, 2023, Kumordzie, 2023, Vallance,\\n2022, Perrigo, 2023, Al Ashry, 2023, Bareis, 2023, Curtis and ChatGPT§, 2023, Rutinowski et al., 2023, Lee and\\nYilmaz Soylu, 2023, Rozado, 2023-03, George et al., 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas, 2023b,\\nVidhya et al., 2023, Suguri Motoki et al., 2023, Rivas and Zhao, 2023, Khlaif, 2023, Bahrini et al., 2023, Ray, 2023, Khan\\nand Umer, 2023, Ausat et al., 2023, Paul et al., 2023, Tamkin et al., 2021, Solaiman et al., 2019, Lee and Yilmaz Soylu,\\n10The Social Impact of Generative AI A P REPRINT\\n2023, Geertsema et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023, Bruff, 2023]. Within this category we found uncertainty surrounding copyright , which is a prominent trend,\\nraising doubts about the possibility of profiting from chat-generated content and whether such content is subject to\\ncopyright protection. Resolving these issues requires legislative intervention, leading to a particular and complex debate\\namong authorities regarding the legal implications of AI-generated content [Hillemann and Zimprich, 2023]. There is\\nalso a growing demand for the enhancement of regulatory frameworks to safeguard original content, considering that\\nmodels like ChatGPT have the potential to negatively impact the work of scientists, writers, researchers, and artists\\n[Al Ashry, 2023, Ray, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023]. Notably, analyses such as by Khowaja et al. [Khowaja et al., 2023] raise crucial questions regarding ownership\\nrights over the data used to train the model and the ownership of the model itself. As mentioned earlier, there is a\\ndominant call for establishing ethical use guidelines , both at a general societal level [Gabbiadini et al., 2023] and\\nspecifically within educational and research institutions [Mhlanga, 2023, Khlaif, 2023, Ausat et al., 2023, Solaiman\\net al., 2019, Tamkin et al., 2021, Lee and Yilmaz Soylu, 2023, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023,\\nIskender, 2023, Stepanechko and Kozub, 2023, Bruff, 2023]. These guidelines will be pivotal in ensuring responsible\\nadoption models such as ChatGPT.\\nAnother emerging trend is developing transparency mechanisms [Equality Now, 2023, Lee and Yilmaz Soylu, 2023,\\nLee, 2023, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nStepanechko and Kozub, 2023]. Transparency is considered a vital strategy to address resistance toward adopting\\nthese models in various contexts [Bahrini et al., 2023] and to mitigate potential AI stigmatization. However, it is also\\nacknowledged that transparency poses challenges that need to be overcome [Khowaja et al., 2023].\\nAnother crucial issue to highlight is the accountability for potentially harmful uses of such technology [Deo, 2023,\\nBiswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and Kozub,\\n2023, Helberger and Diakopoulos, 2023], which involves both the end-users and the companies responsible for its\\ndevelopment [Helberger and Diakopoulos, 2023, Deo, 2023, Institute for Human Rights and Business, 2023, George\\net al., 2023, Biswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and\\nKozub, 2023]. This concern extends to applications in sensitive domains like the military [Deo, 2023]. Furthermore,\\nother several notable trends emerge, including the need for timely and appropriate regulation [Gabbiadini et al.,\\n2023, Bruff, 2023], the existence of political bias (29, 40, 42, 62), transformations within the AI market [Deo,\\n2023, Equality Now, 2023, Kumordzie, 2023, Perrigo, 2023, Tamkin et al., 2021, Farrell et al., 2022, Geertsema et al.,\\n2023], and the integration of renewable technologies and environmental awareness within this field [C and J, 2023,\\nInternational Journal of Human Rights Law Review, 2023].\\nRQ3. Which areas of improvement can be identified in the development of such technologies?\\nThe findings concerning areas for improvement within the field of generative AI present a diverse range of perspectives\\n[Hillemann and Zimprich, 2023, Helberger and Diakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023,\\nInstitute for Human Rights and Business, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al., 2023, Kumordzie,Institute for Human Rights and Business, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al., 2023, Kumordzie,\\n2023, Wolf, 2023, Vallance, 2022, Perrigo, 2023, Al Ashry, 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas,\\n2023a, Zhuo et al., 2023, Abdullah et al., 2022, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Solaiman et al., 2019, Tamkin et al., 2021, C and J, 2023, Tiunova and Muñoz,\\n2023, Khowaja et al., 2023, Gabbiadini et al., 2023, Gupta et al., 2023, Iskender, 2023, Stepanechko and Kozub, 2023,\\nBruff, 2023]. One prominent area is the examination of regulations [Hillemann and Zimprich, 2023, Helberger and\\nDiakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023, Institute for Human Rights and Business, 2023,\\nKumordzie, 2023, Wolf, 2023, Al Ashry, 2023, George et al., 2023, Levente, 2023, Lee and Yilmaz Soylu, 2023, Ray,\\n2023, Solaiman et al., 2019, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023, Bruff, 2023],\\nparticularly regarding whether the risk-based approach outlined in the AI Act effectively covers generative models\\n[Helberger and Diakopoulos, 2023, Wolf, 2023]. It is suggested that comprehensive guidelines should encompass the\\nentire spectrum, from its application to the AI Research and Development (R&D) [George et al., 2023]. Furthermore,\\nthere is a growing advocacy for a people-centred vision [Wolf, 2023, Kumordzie, 2023, Ray, 2023] that emphasizes\\nthe importance of human rights and ethical considerations in designing and implementing generative AI systems.\\nSpecifically, Solaiman et al. [Solaiman et al., 2019] examine the need to \"build frameworks for navigating trade-offs\"\\nand develop decision-making frameworks that account for the complexities and potential trade-offs associated with\\ngenerative AI. Similarly, Li [Levente, 2023] highlights the necessity to transform the European regulatory paradigm to\\neffectively address the challenges posed by LLMs.\\nAnother area of opportunity lies in addressing technical limitations within generative AI models [Levente, 2023, Curtis\\nand ChatGPT§, 2023, Sun and Hoelscher, 2023, Abdullah et al., 2022, Bahrini et al., 2023, Cao et al., 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Thirunavukarasu et al., 2023, Gupta\\net al., 2023, Iskender, 2023]. For instance, a significant challenge is the presence of fictional references [Tiunova\\n11The Social Impact of Generative AI A P REPRINT\\nand Muñoz, 2023] within the generated text, which hampers its reliability. Additionally, words repetition further\\naffects the produced text’s overall quality [Wolf, 2023]. Moreover, the phenomenon of inaccurate information, named\\n\"hallucinations \" [Wolf, 2023, Tamkin et al., 2021], and the lack of context understanding [Paul et al., 2023], are also\\nidentified as areas requiring attention and improvement. Furthermore, the literature highlights the need for enhanced\\nrisk mitigation mechanisms [Equality Now, 2023, Biddle, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al.,\\n2023, Vallance, 2022, Perrigo, 2023]. This entails refining processes for filtering potentially harmful responses\\n[Biddle, 2023], improving the quality and reliability of the data used to train the models [Levente, 2023], and\\nincorporating ethical guidelines for its development [Air et al., 2023], among other measures [Perrigo, 2023].\\nData governance is another significant area that requires attention [Deo, 2023, Ray, 2023, Khan and Umer, 2023,\\nTiunova and Muñoz, 2023]; this crucial task involves safeguarding sensitive information against security breaches,\\nunauthorized access, and information theft [Ray, 2023, Khan and Umer, 2023]. In the same vein, establishing clear\\nguidelines regarding the scope and limitations of information exchange with third parties is also paramount\\n[Tiunova and Muñoz, 2023].\\nOther areas of opportunity include the need for up-to-date data [Sun and Hoelscher, 2023, Zhuo et al., 2023, Abdullah\\net al., 2022, Gupta et al., 2023] to ensure the accuracy and relevance of generative AI models. However, this requirement\\npresents a trade-off between incorporating new data to improve performance or addressing data governance issues first.\\nThe literature review also highlights several other areas of opportunity for improvement; these include promoting\\nend-user responsibilities [Li, 2023], advocating for timely regulation [Li, 2023, Tiunova and Muñoz, 2023], and\\nraising awareness of environmental impact [Tiunova and Muñoz, 2023, Geertsema et al., 2023], among other\\nconsiderations.\\n7 CONCLUSION AND FUTURE WORK\\nWhile we are still in the early stages of evaluating the social impact of generative AI models, this systematic literature\\nreview allows us to gain initial insights into the perceptions surrounding their emergence in contemporary society.\\nOur analysis has revealed notable areas of concern, particularly privacy and the potential for bias . As generative\\nmodels continue to be adopted in diverse social contexts, addressing and mitigating issues related to inequality, bias,\\ndiscrimination, and stereotypes becomes urgent.\\nIn light of this, it is essential to note that generative AI reflects the social context in which it was created. As these\\nmodels are trained on data that captures various aspects of our reality, it becomes clear that addressing their flaws and\\nbiases requires a comprehensive understanding of the broader social context within which they operate. Rather than\\nsolely focusing on repairing the model, it is imperative to also engage in a critical examination of the social factors that\\ncontribute to these biases and limitations.\\nSome analyses argue that generative AI models, such as ChatGPT, are not intended to address social inequalities.\\nWhile this may be true, it is also essential that these models do not inadvertently contribute to exacerbating social\\nissues. Acknowledging that scientific breakthroughs do not occur within a social vacuum is critical. Therefore,\\nwe must foster a conscious, responsible, and ethically-driven progression of generative AI. Equally important is to\\nemphasize that generative AI models hold immense potential and offer substantial benefits across various fields and\\nsectors, including education, medicine, marketing, business, research, and science. Their impact extends beyondsectors, including education, medicine, marketing, business, research, and science. Their impact extends beyond\\ninnovation and significantly influences the legislative landscape. Consequently, policymakers need to address the\\nnecessity for appropriate regulation that not only addresses significant concerns associated with their use, but also\\nsupports and facilitates the ethical development of generative AI models [Clayton, 2023, Anthropic, b]. Undoubtedly,\\nAI generative models have reshaped our way of being in the world, triggering profound changes in our perception\\nand engagement with it. In our relentless pursuit to emulate human interaction, we have also confronted stereotypes,\\nbiases, and imperfections. Rather than succumbing to discouragement, we should use them as motivation to address\\nthem diligently and strive for continuous improvement. More work is required to develop more robust frameworks\\nand ethical guidelines, not only to improve accuracy and efficiency but also to ensure responsible deployment. As\\npart of future work, we propose evaluating ChatGPT regulation in the US, Europe, and Latin America. This analysis\\nwill examine how current legal tools address generative models’ challenges in particular locations. Additionally, to\\nunderstand ChatGPT users’ professional and social views, a survey has been designed and will be distributed among\\nprofessionals and researchers from diverse universities and research centers worldwide. With these exercises, we want\\nto gain a comprehensive understanding of its adoption, regulatory challenges, user perspectives, and deepening into its\\nsocial impact.\\n12The Social Impact of Generative AI A P REPRINT\\nReferences\\nJoe McKendrick. Learning to trust artificial intelligence: An optimist’s view,\\n2019. URL https://www.forbes.com/sites/joemckendrick/2019/06/09/\\nlearning-to-trust-artificial-intelligence-an-optimists-view/ .\\nHenry Farrell, Abraham Newman, and Jeremy Wallace. Spirals of delusion how al distorts decision-making and\\nmakes dictators more dangerous. 2022. ISSN 0015-7120. URL https://www.foreignaffairs.com/world/\\nspirals-delusion-artificial-intelligence-decision-making .\\nCindy Gordon. Will 2023 be the year that OpenAI’s ChatGPT breaks free?, 2022. URL https://www.forbes.com/\\nsites/cindygordon/2022/12/29/will-2023-be-the-year-that-openais-chatgpt-breaks-free/ .\\nTrevor J. Pinch and Wiebe E. Bijker. The social construction of facts and artefacts: Or how the sociology of\\nscience and the sociology of technology might benefit each other. 14(3):399–441, 1984. ISSN 0306-3127. URL\\nhttps://www.jstor.org/stable/285355 .\\nAcoustic Modeling Using Deep Belief Networks | IEEE Journals & Magazine | IEEE Xplore, 2012. URL https:\\n//ieeexplore.ieee.org/document/5704567 .\\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and\\nIllia Polosukhin. Attention Is All You Need, December 2017. URL http://arxiv.org/abs/1706.03762 .\\narXiv:1706.03762 [cs].\\nJacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Trans-\\nformers for Language Understanding, May 2019. URL http://arxiv.org/abs/1810.04805 . arXiv:1810.04805\\n[cs].\\nZhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. XLNet: Generalized\\nAutoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems ,\\nvolume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/\\n2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html .\\nDaniel Jurafsky and James Martin. Speech and Language Processing: An Introduction to Natural Language Processing,\\nComputational Linguistics, and Speech Recognition , volume 2. February 2008.\\nAlec Radford and Karthik Narasimhan. Improving language understanding by generative pre-training. 2018.\\nOpenAI. Openai, 2018. URL https://openai.com/research/language-unsupervised .\\nOpenAI. Gpt-4, a. URL https://openai.com/product/gpt-4 .\\nOpenAI. Openai publishing results refusal, b. URL https://cdn.openai.com/papers/gpt-4.pdf .\\nGoogle. Palm2, a. URL https://ai.google/discover/palm2 .\\nTogether. Gpt-jt. URL https://bit.ly/3TqjFes .\\nMeta. Llama. URL https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ .\\nOpenAI. Instructgpt, c. URL https://openai.com/research/instruction-following .\\nOpenAI. Chatgpt, d. URL https://openai.com/blog/chatgpt .\\nBing. Bing chat. URL https://www.bing.com/new .\\nGoogle. Bard, b. URL https://bard.google.com/ .\\nGoogle. Lamda, c. URL https://blog.google/technology/ai/lamda/ .\\nJasper. Jasper chat. URL https://www.jasper.ai/chat .\\nAnthropic. Claude, a. URL https://www.anthropic.com/index/claudes-constitution .\\nOpenAI DALL-E 2. Dall-e 2. URL https://openai.com/product/dall-e-2 .\\nStable Diffusion. Stable diffusion. URL https://stablediffusionweb.com/ .\\nWritesonic. Photosonic art generator. URL https://writesonic.com/photosonic-ai-art-generator .\\nRunway. Runway. URL https://research.runwayml.com/gen2 .\\nMeta Make-A-Video. Make-a-video. URL https://makeavideo.studio/ .\\nGoogle Imagen. Imagen video. URL https://imagen.research.google/ .\\nRuben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi\\nSaffar, Santiago Castro, Julius Kunze, and Dumitru Erhan. Phenaki. URL https://phenaki.video/ .\\n13The Social Impact of Generative AI A P REPRINT\\nBloomberg. Bloomberggpt. URL https://www.bloomberg.com/company/press/\\nbloomberggpt-50-billion-parameter-llm-tuned-finance/ .\\nSlack. Slackgpt. URL https://slack.com/intl/en-gb/blog/news/introducing-slack-gpt .\\nMicrosoft. Biogpt. URL https://github.com/microsoft/BioGPT .\\nGitHub. Copilot. URL https://github.com/features/copilot .\\nBruno Cartaxo, Gustavo Pinto, and Sergio Soares. The role of rapid reviews in supporting decision-making in software\\nengineering practice. In Proceedings of the 22nd International Conference on Evaluation and Assessment in Software\\nEngineering 2018 , EASE’18, page 24–34, New York, NY , USA, 2018. Association for Computing Machinery. ISBN\\n9781450364034. doi:10.1145/3210459.3210462. URL https://doi.org/10.1145/3210459.3210462 .\\nBarbara Kitchenham and Stuart Charters. Guidelines for performing systematic literature reviews in software engineer-\\ning. 2, 01 2007.\\nJan Piasecki, Marcin Waligora, and Vilius Dranseika. Google search as an additional source in systematic reviews.\\nScience and Engineering Ethics , 24, 12 2017. doi:10.1007/s11948-017-0010-4.\\nMaria Teresa Baldassarre, Danilo Caivano, Berenice Fernàndez Nieto, Domenico Gigante, and Azzurra Ragone.\\nhttps://figshare.com/s/77c3a667671472f8eccc, 2023. URL https://figshare.com/s/77c3a667671472f8eccc .\\nEuropean Commission. Proposal for a regulation of the european parliament and of the council laying down harmonised\\nrules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts, 2021. URL\\nhttps://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206 .\\nInternational Journal of Human Rights Law Review. ChatGPT and human rights: Nav-\\nigating the technological frontier, 2023. URL https://humanrightlawreview.in/\\nchatgpt-and-human-rights-navigating-the-technological-frontier/ .\\nPartha Pratim Ray. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics,\\nlimitations and future scope. 3:121–154, 2023. ISSN 2667-3452. doi:10.1016/j.iotcps.2023.04.003. URL https:\\n//www.sciencedirect.com/science/article/pii/S266734522300024X .\\nJustin Paul, Akiko Ueno, and Charles Dennis. ChatGPT and consumers: Benefits, pitfalls and future research agenda.\\npage ijcs.12928, 2023. ISSN 1470-6423, 1470-6431. doi:10.1111/ijcs.12928. URL https://onlinelibrary.\\nwiley.com/doi/10.1111/ijcs.12928 .\\nDennis Hillemann and Stephan Zimprich. ChatGPT - legal challenges, legal opportunities, 2023. URL https:\\n//www.fieldfisher.com/en/insights/chatgpt-legal-challenges-legal-opportunities .\\nJessica Davis. ChatGPT: Enterprises eye use cases, ethicists remain con-\\ncerned, 2022. URL https://www.informationweek.com/big-data/\\nchat-gpt-enterprises-eye-use-cases-ethicists-remain-concerned .\\nSamantha Lock. What is AI chatbot phenomenon ChatGPT and could it replace humans?\\n2022. ISSN 0261-3077. URL https://www.theguardian.com/technology/2022/dec/05/\\nwhat-is-ai-chatbot-phenomenon-chatgpt-and-could-it-replace-humans .\\nPriyanka Deo. Is ChatGPT dangerous for humanity? 2023. ISSN 0971-8257. URL https://timesofindia.\\nindiatimes.com/business/india-business/is-chatgpt-dangerous-for-humanity/articleshow/\\n98471105.cms .\\nPablo Rivas and Liang Zhao. Marketing with ChatGPT: Navigating the ethical terrain of GPT-based chatbot technology.\\n4(2):375–384, 2023. ISSN 2673-2688. doi:10.3390/ai4020019. URL https://www.mdpi.com/2673-2688/4/2/\\n19.\\nDeladem Kumordzie. All you need to know about ChatGPT & why its\\na threat to google, 2023. URL https://medium.com/@cdkumordzie/\\nall-you-need-to-know-about-chatgpt-why-its-a-threat-to-google-fd2b887c8ff8 .\\nLevente. The pros and cons dark side of using chat GPT for businesses, 2023. URL https://medium.com/\\n@Levente22/the-pros-and-cons-dark-side-of-using-chat-gpt-for-businesses-cf2373119dab .\\nBernard Marr. What does ChatGPT really mean for your job?, 2023. URL https://www.forbes.com/sites/\\nbernardmarr/2023/02/13/what-does-chatgpt-really-mean-for-your-job/ .Bernard Marr. What does ChatGPT really mean for your job?, 2023. URL https://www.forbes.com/sites/\\nbernardmarr/2023/02/13/what-does-chatgpt-really-mean-for-your-job/ .\\nMalak Abdullah, Alia Madain, and Yaser Jararweh. ChatGPT: Fundamentals, applications and social impacts. In 2022\\nNinth International Conference on Social Networks Analysis, Management and Security (SNAMS) , pages 1–8, 2022.\\ndoi:10.1109/SNAMS58071.2022.10062688. ISSN: 2831-7343.\\n14The Social Impact of Generative AI A P REPRINT\\nSunder Ali Khowaja, Parus Khuwaja, and Kapal Dev. ChatGPT needs SPADE (sustainability, PrivAcy, digital divide,\\nand ethics) evaluation: A review, 2023. URL http://arxiv.org/abs/2305.03123 .\\nBulbul Gupta, Tabish Mufti, Shahab Saquib Sohail, and Dag Øivind Madsen. ChatGPT: A brief narrative review. 2023.\\ndoi:10.20944/preprints202304.0158.v1. URL https://www.preprints.org/manuscript/202304.0158/v1 .\\nDerek Bruff. Teaching in the artificial intelligence age of ChatGPT | teaching + learning lab, 2023. URL https:\\n//tll.mit.edu/teaching-in-the-artificial-intelligence-age-of-chatgpt/ .\\nAli Iskender. Holy or unholy? interview with open AI’s ChatGPT. 34:3414–3414, 2023. ISSN 1314-0817.\\ndoi:10.54055/ejtr.v34i.3169. URL https://ejtr.vumk.eu/index.php/about/article/view/3169 .\\nMuhammad Salar Khan and Hamza Umer. Chatgpt in finance: Addressing ethical challenges, 2023. URL https:\\n//papers.ssrn.com/abstract=4439967 .\\nHyunsu Lee. The rise of ChatGPT: Exploring its potential in medical education. page ase.2270, 2023. ISSN 1935-9772,\\n1935-9780. doi:10.1002/ase.2270. URL https://anatomypubs.onlinelibrary.wiley.com/doi/10.1002/\\nase.2270 .\\nGrace H. Sun and Stephanie H. Hoelscher. The ChatGPT storm and what faculty can do. 48(3):119,\\n2023. ISSN 0363-3624. doi:10.1097/NNE.0000000000001390. URL https://journals.lww.com/\\nnurseeducatoronline/Fulltext/2023/05000/The_ChatGPT_Storm_and_What_Faculty_Can_Do.1.\\naspx?context=FeaturedArticles&collectionId=5 .\\nAndré Guskow Cardoso. Do we need a chat-GPT-gov? the importance of technology for effective access to public\\ninformation., 2023. URL https://papers.ssrn.com/abstract=4365773 .\\nMuhammad Shidiq. The use of artificial intelligence-based chat-gpt and its challenges for the world of education;\\nfrom the viewpoint of the development of creative writing skills. 1(1):353–357, 2023. ISSN 2986-5832. URL\\nhttps://ejournal.unuja.ac.id/index.php/icesh/article/view/5614 .\\nJeonghyun Lee and Meryem Yilmaz Soylu. ChatGPT and assessment in higher education, 2023. URL\\nhttps://c21u.gatech.edu/sites/default/files/publication/2023/03/C21U%20ChatGPT%20White%\\n20Paper_Final.pdf .\\nDavid Mhlanga. Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning, 2023.\\nURL https://papers.ssrn.com/abstract=4354422 .\\nAram Bahrini, Mohammadsadra Khamoshifar, Hossein Abbasimehr, Robert J. Riggs, Maryam Esmaeili, Rastin Mastali\\nMajdabadkohne, and Morteza Pasehvar. ChatGPT: Applications, opportunities, and threats, 2023. URL http:\\n//arxiv.org/abs/2304.09103 .\\nAbu Muna Almaududi Ausat, Berdinata Massang, Mukhtar Efendi, Nofirman Nofirman, and Yasir Riady. Can chat GPT\\nreplace the role of the teacher in the classroom: A fundamental analysis. 5(4):16100–16106, 2023. ISSN 2654-5497.\\ndoi:10.31004/joe.v5i4.2745. URL https://www.jonedu.org/index.php/joe/article/view/2745 .\\nIrene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-V oss, Jeff Wu, Alec Radford, Gretchen\\nKrueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine\\nWang. Release strategies and the social impacts of language models, 2019. URL http://arxiv.org/abs/1908.\\n09203 .\\nAlla Tiunova and Felipe Muñoz. Chatgpt: Using ai in social studies academic research, 2023. URL https://papers.\\nssrn.com/abstract=4451612 .\\nPaul Geertsema, Albert Bifet, and Richard Green. ChatGPT and large language models: What are the implications for\\npolicy makers?, 2023. URL https://papers.ssrn.com/abstract=4424048 .\\nA.Shaji George, A.S.Hovan George, and A.S.Gabrio Martin. The environmental impact of AI: A case study of water\\nconsumption by chat GPT. 2023. doi:10.5281/ZENODO.7855594. URL https://zenodo.org/record/7855594 .\\nSom S Biswas. Role of chat gpt in public health. Annals of Biomedical Engineering , pages 1–2, 2023a.\\nSam Biddle. The internet’s new favorite AI proposes torturing iranians and surveilling mosques, 2023. URL https:Sam Biddle. The internet’s new favorite AI proposes torturing iranians and surveilling mosques, 2023. URL https:\\n//theintercept.com/2022/12/08/openai-chatgpt-ai-bias-ethics/ .\\nDavid C and Paul J. ChatGPT and large language models: what’s the risk?, 2023. URL https://www.ncsc.gov.uk/\\nblog-post/chatgpt-and-large-language-models-whats-the-risk .\\nEquality Now. ChatGPT-4 reinforces sexist stereotypes by stating a girl cannot “handle technicali-\\nties and numbers” in engineering, 2023. URL https://www.equalitynow.org/news_and_insights/\\nchatgpt-4-reinforces-sexist-stereotypes/ .\\n15The Social Impact of Generative AI A P REPRINT\\nKurt Robson. Do AI chatbots like ChatGPT pose a major cybersecurity risk?, 2023. URL https://www.verdict.\\nco.uk/do-ai-chatbots-like-chatgpt-pose-a-major-cybersecurity-risk/ .\\nBilly Perrigo. Exclusive: The $2 per hour workers who made ChatGPT safer, 2023. URL https://time.com/\\n6247678/openai-chatgpt-kenya-workers/ .\\nZihao Li. The dark side of ChatGPT: Legal and ethical challenges from stochastic parrots and hallucination, 2023.\\nURL http://arxiv.org/abs/2304.14347 .\\nN. Gowri Vidhya, D. Devi, Nithya A, and T. Manju. Prognosis of exploration on chat GPT with artificial\\nintelligence ethics. 2(9):60–69, 2023. ISSN 2764-3417. doi:10.14295/bjs.v2i9.372. URL https://www.\\nbrazilianjournalofscience.com.br/revista/article/view/372 .\\nAlex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. Understanding the capabilities, limitations, and societal\\nimpact of large language models, 2021. URL http://arxiv.org/abs/2102.02503 .\\nChristoph Treude and Hideaki Hata. She elicits requirements and he tests: Software engineering gender bias in large\\nlanguage models, 2023. URL http://arxiv.org/abs/2303.10131 .\\nArun James Thirunavukarasu, Refaat Hassan, Shathar Mahmood, Rohan Sanghera, Kara Barzangi, Mohanned El\\nMukashfi, and Sachin Shah. Trialling a large language model (ChatGPT) in general practice with the applied\\nknowledge test: Observational study demonstrating opportunities and limitations in primary care. 9(1):e46599, 2023.\\ndoi:10.2196/46599. URL https://mededu.jmir.org/2023/1/e46599 .\\nOksana Stepanechko and Liubov Kozub. English teachers’ concerns about the ethical use of chat GPT by university\\nstudents. (25):297–302, 2023. ISSN 2710-3056. doi:10.36074/grail-of-science.17.03.2023.051. URL https:\\n//archive.journal-grail.science/index.php/2710-3056/article/view/1040 .\\nNatali Helberger and Nicholas Diakopoulos. ChatGPT and the AI act. 12(1), 2023. ISSN 2197-6775.\\ndoi:10.14763/2023.1.1682. URL https://policyreview.info/essay/chatgpt-and-ai-act .\\nChristopher Air, Shanaka Wijetunge, and Alexander Dimitrov. The ethics of AI: The cyber risks\\nposed by chat GPT, 2023. URL https://www.dacbeachcroft.com/en/gb/articles/2023/february/\\nthe-ethics-of-ai-the-cyber-risks-posed-by-chat-gpt .\\nZachary B. Wolf. AI can be racist, sexist and creepy. what should we do about it? | CNN politics, 2023. URL\\nhttps://www.cnn.com/2023/03/18/politics/ai-chatgpt-racist-what-matters/index.html .\\nChris Vallance. ChatGPT: New AI chatbot has everyone talking to it. 2022. URL https://www.bbc.com/news/\\ntechnology-63861322 .\\nDavid Rozado. The political biases of ChatGPT. 12(3):148, 2023-03. ISSN 2076-0760. doi:10.3390/socsci12030148.\\nURL https://www.mdpi.com/2076-0760/12/3/148 .\\nInstitute for Human Rights and Business. We asked ChatGPT about its impact on human rights and business. here’s what\\nit told us, 2023. URL https://www.ihrb.org/focus-areas/information-communication-technology/\\nwe-asked-chatgpt-about-its-impact-on-human-rights-on-business-heres-what-it-told-us .\\nSom Biswas. Prospective role of chat GPT in the military: According to ChatGPT. 2023b. ISSN 2632-3834.\\ndoi:10.32388/8WYYOD. URL https://www.qeios.com/read/8WYYOD .\\nZuheir N. Khlaif. Ethical concerns about using AI-generated text in scientific research, 2023. URL https://papers.\\nssrn.com/abstract=4387984 .\\nGarante per la protezione dei dati personali. ChatGPT: OpenAI riapre la piattaforma in italia garantendo più trasparenza\\ne più diritti a utenti e non utenti europei, 2023. URL https://www.garanteprivacy.it:443/home/docweb/-/\\ndocweb-display/docweb/9881490 .\\nNigel Curtis and ChatGPT§. To chatgpt or not to chatgpt? the impact of artificial intelligence on academic publishing.\\n42(4):275, 2023. ISSN 0891-3668. doi:10.1097/INF.0000000000003852. URL https://journals.lww.com/\\npidj/Citation/2023/04000/To_ChatGPT_or_not_to_ChatGPT__The_Impact_of.1.aspx .\\nAlessandro Gabbiadini, Ognibene Dimitri, Cristina Baldissarri, and Anna Manfredi. Does ChatGPT pose a threat topidj/Citation/2023/04000/To_ChatGPT_or_not_to_ChatGPT__The_Impact_of.1.aspx .\\nAlessandro Gabbiadini, Ognibene Dimitri, Cristina Baldissarri, and Anna Manfredi. Does ChatGPT pose a threat to\\nhuman identity?, 2023. URL https://papers.ssrn.com/abstract=4377900 .\\nTech Telegraph. Microsoft, OpenAI, alphabet and big tech are ignoring the human cost behind the rise of ChatGPT and\\nother AI-powered chatbots. 2023. ISSN 0307-1235. URL https://bit.ly/3uXvdwp .\\nChase DiBenedetto. ChatGPT’s surprisingly human voice came with a human cost, 2023. URL https://mashable.\\ncom/article/chat-gpt-open-ai-workers-exploitation .\\n16The Social Impact of Generative AI A P REPRINT\\nAyman Al Ashry. Chat GPT and its legal impact on society as a new form of AI - copy-\\nright - united arab emirates, 2023. URL https://www.mondaq.com/copyright/1299882/\\nchat-gpt-and-its-legal-impact-on-society-as-a-new-form-of-ai .\\nJascha Bareis. We are scared of the question chat-GPT cannot answer. because the answer is too obvious., 2023. URL\\nhttps://papers.ssrn.com/abstract=4410324 .\\nJérôme Rutinowski, Sven Franke, Jan Endendyk, Ina Dormuth, and Markus Pauly. The self-perception and political\\nbiases of ChatGPT, 2023. URL http://arxiv.org/abs/2304.07333 .\\nFabio Suguri Motoki, Valdemar Pinho Neto, and Victor Rodrigues. More human than human: Measuring ChatGPT\\npolitical bias. 2023. doi:10.2139/ssrn.4372349. URL https://ueaeprints.uea.ac.uk/id/eprint/91668/ .\\nWeeTech Solution. What is ChatGPT and the benefits of using ChatGPT, 2023. URL https://www.\\nweetechsolution.com/blog/what-is-chat-gpt-and-the-advantages-of-using-chat-gpt .\\nTerry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. Exploring AI ethics of ChatGPT: A diagnostic\\nanalysis, 2023. URL http://arxiv.org/abs/2301.12867 .\\nAdam Sobieszek and Tadeusz Price. Playing games with ais: The limits of GPT-3 and similar large language models.\\n32(2):341–364, 2022. ISSN 1572-8641. doi:10.1007/s11023-022-09602-0. URL https://doi.org/10.1007/\\ns11023-022-09602-0 .\\nYihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun. A comprehen-\\nsive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. 2023.\\ndoi:10.48550/ARXIV .2303.04226. URL https://arxiv.org/abs/2303.04226 .\\nJames Clayton. Sam altman: CEO of OpenAI calls for US to regulate artificial intelligence. 2023. URL https:\\n//www.bbc.com/news/world-us-canada-65616866 .\\nAnthropic. Anthropic raises $450 million in series c funding to scale reliable. . . , b. URL https://www.anthropic.\\ncom/index/anthropic-series-c .\\n17'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs = \"\"\n",
    "\n",
    "for item in pages:\n",
    "    docs += item.page_content\n",
    "\n",
    "docs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "15b0afe4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "template =\"\"\"\n",
    "```\n",
    "{context}\n",
    "```\n",
    "总结上面的论文内容\n",
    "\"\"\"\n",
    "\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "0be4367b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "outputParser = StrOutputParser()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "a2ab48b3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'上面列出的论文涵盖了ChatGPT的广泛主题，包括其技术、社会和法律影响。以下是一些关键要点：\\n\\n* ChatGPT是OpenAI开发的大型语言模型，可以生成类似人类的文本。\\n* 它使用Transformer架构进行训练，并利用自我监督学习来提高性能。\\n* ChatGPT在各种任务中表现出色，包括回答问题、撰写文章和翻译文本等。\\n* 然而，ChatGPT也存在一些局限性，例如对数据偏差的潜在偏见，以及对事实核查的限制。\\n* 在社会方面，ChatGPT引发了关于其道德影响以及它如何改变我们与技术互动方式的讨论。\\n* 从法律角度来看，ChatGPT引发了对版权和知识产权等问题的关注。\\n* 最后，论文还强调了生成式人工智能（AIGC）的发展历史，从GAN到ChatGPT。'"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = prompt | model | outputParser\n",
    "chain.invoke({\"context\":docs})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d26662a",
   "metadata": {},
   "source": [
    "## PyPDF 目录\n",
    "\n",
    "Load PDFs from directory 从目录加载 PDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "30097c55",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='KOALA: Self-Attention Matters in Knowledge Distillation of Latent\\nDiffusion Models for Memory-Efficient and Fast Image Synthesis\\nYoungwan Lee1,2Kwanyong Park1Yoorhim Cho3Yong-Ju Lee1Sung Ju Hwang2\\n1Electronics and Telecommunications Research Institute (ETRI), South Korea\\n2Korea Advanced Institute of Science and Technology (KAIST), South Korea\\n3Sookmyung Women’s University, South Korea\\nproject page: https://youngwanlee.github.io/KOALA/\\nAbstract\\nStable diffusion is the mainstay of the text-to-image (T2I)\\nsynthesis in the community due to its generation perfor-\\nmance and open-source nature. Recently, Stable Diffusion\\nXL (SDXL), the successor of stable diffusion, has received a\\nlot of attention due to its significant performance improve-\\nments with a higher resolution of 1024×1024 and a larger\\nmodel. However, its increased computation cost and model\\nsize require higher-end hardware ( e.g., bigger VRAM GPU)\\nfor end-users, incurring higher costs of operation. To ad-\\ndress this problem, in this work, we propose an efficient la-\\ntent diffusion model for text-to-image synthesis obtained by\\ndistilling the knowledge of SDXL. To this end, we first per-\\nform an in-depth analysis of the denoising U-Net in SDXL,\\nwhich is the main bottleneck of the model, and then design\\na more efficient U-Net based on the analysis. Secondly, we\\nexplore how to effectively distill the generation capability\\nof SDXL into an efficient U-Net and eventually identify four\\nessential factors, the core of which is that self-attention is\\nthe most important part. With our efficient U-Net and self-\\nattention-based knowledge distillation strategy, we build\\nour efficient T2I models, called KOALA-1B &-700M, while\\nreducing the model size up to 54% and 69% of the origi-\\nnal SDXL model. In particular, the KOALA-700M is more\\nthan twice as fast as SDXL while still retaining a decent\\ngeneration quality. We hope that due to its balanced speed-\\nperformance tradeoff, our KOALA models can serve as a\\ncost-effective alternative to SDXL in resource-constrained\\nenvironments.\\n1. Introduction\\nSince the emergence of the Stable diffusion mod-\\nels (SDMs) [41, 47, 48] which are based on the latent dif-fusion model [46], not only has text-to-image synthesis\\ngreatly advanced but also applications utilizing it have been\\nactively developed, such as image editing [8, 63], control-\\nlable image synthesis [38, 71] personalized image synthe-\\nsis [12, 30, 52], text-to-video generation [3, 25] and 3D as-\\nset synthesis [32, 42, 70]. While these downstream tasks\\nbenefit from SDM’s superior image generation quality as a\\nbackbone, its massive computation costs and large model\\nsize require expensive hardware equipment and thus incur\\nhuge costs. Furthermore, a more recent version of the sta-\\nble diffusion model, SDXL [41], demonstrates significantly\\nimproved image generation quality with a higher resolution\\nof1024×1024 , but at the cost of more computations and\\nmemory requirement.\\nTo alleviate this computation burden, several works\\nhave been proposed, which introduce quantization [59],\\nhardware-aware optimization [7, 9], denoising step reduc-\\ntion [31, 37, 54], and architectural model optimization [26,\\n31]. In particular, the denoising step reduction [31, 37, 54]\\nand architectural model compression [26] methods adopt\\nthe knowledge distillation (KD) scheme [15, 18] by allow-\\ning the model to mimic the output of the SDM as a teacher\\nmodel. The step-distillation methods [31, 37, 54] allow the\\ndenoised latent of the diffusion model in the early denois-\\ning steps to mimic the output in the later denoising steps of\\nthe teacher model. As an orthogonal work for the architec-\\ntural model compression, BK-SDM [26] exploits KD when\\ncompressing the most heavy-weight part, U-Net, in SDM-\\nv1.4 [47]. BK-SDM builds a compressed U-Net by simply\\nremoving some blocks and allows the compressed U-Net\\nto mimic the last features at each stage and the predicted\\nnoise of the teacher model during the pre-training phase.\\nHowever, the compression method proposed by BK-SDM\\nachieves a limited compression rate (33%) when applied to\\nthe larger SDXL than SDM-v1.4 and the strategy for feature\\n1arXiv:2312.04005v1  [cs.CV]  7 Dec 2023', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 0}),\n",
       " Document(page_content='Portraitphotoofastandinggirl,photograph,goldenhair,depthoffield,moodylight,goldenhour,centered,extremel-ydetailed,awardwinningphotography,realisticProfessional portrait photo of an anthropomorphic cat wearing fancy gentleman hat and jacket walking in autumn forest.Cute toy owl made of suede, geometric accurate, relief on skin, plastic relief surface of body, intricate details, cinematic\\nAlbertEinsteininasurrealistCyberpunk2077world,hyper-realisticPirateshiptrappedinacosmicmaelstromnebula,renderedincosmicbeachwhirlpoolengine,volumetriclighting,spect-acular,ambientlights,lightpollution,cinematicatmosphere,artnouveaustyle,illustrationartartwork,intricatedetail.Cutesmalldogsittinginamovietheaterwatchingamovie,unrealengine,cozyindoorlighting,artstation,detailed,digit-alpainting,cinematic,characterdesignbypixar,hyperrealis-tic,octanerender\\nFigure 1. Generated samples by our KOALA-700M trained by the proposed knowledge-distillation approach with SDXL [41] . With\\nthe following settings: FP-16 precision, 1024×1024 resolution, and 25 denoising steps with Euler discrete scheduler [24] same as the\\nhuggingface’s SDXL-Base-1.0 model [66], the inference time is 1.4 seconds on an NVIDIA 4090 (24GB) GPU, which is over 2×faster\\nthan SDXL-Base-1.0 (3.3s) while reducing the U-Net model size by 69%.\\ndistillation in U-Net has not yet been fully explored .\\nIn this work, our goal is to build a more efficient text-\\nto-image synthesis model by distilling the generation ca-\\npability of SDXL [41]. To this end, we first perform an\\nin-depth analysis of SDXL’s denoising U-Net, which re-\\nquires the most number of parameters and computational\\ncost, and find that most of the parameters are concentrated\\nat the lowest feature level due to the large number of trans-\\nformer blocks. Based on the analysis, we design an effi-\\ncient U-Net by reducing the origin SDXL’s U-Net by up\\nto 69% (vs. BK’s method: 33%). Furthermore, we inves-\\ntigate how to effectively distill SDXL as a teacher model\\nand find four essential factors for feature-level knowledge\\ndistillation. The core of these findings is that self-attention\\nfeatures are the most crucial for distillation due to the factthat self-attention-based KD allows models to learn more\\ndiscriminative representations between objects or attributes.\\nWith our knowledge distillation (KD) strategies, we\\ntrain an efficient text-to-image synthesis model on top of\\nSDXL [41], called KOALA, by only replacing SDXL’s\\nU-Net with our efficient U-Net. KOALA is trained on a\\nsmaller publicly available LAION-Aesthestics-V2-6+ [57],\\nwhich has only 8M text-image pairs. Recent studies [2,\\n28, 67, 68] have shown that FID [17] is not well corre-\\nlated with the fidelity of the generated image, and thus\\nwe use two alternative evaluation metrics: Human Pref-\\nerence Score (HPSv2) [67] for visual aesthetics and T2I-\\nCompbench [21] for image-text alignment. Our efficient\\nKOALA models consistently outperform BK-SDM [26]’s\\nKD method in both metrics. Furthermore, our smaller\\n2', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 1}),\n",
       " Document(page_content='model, KOALA-700M, shows better performance than\\nSDM-v2.0 [48], which is one of the most widely used in\\nthe community, while having a similar model size and in-\\nference speed. Lastly, to validate its practical impact, we\\nperform inference analysis on a variety of consumer-grade\\nGPUs with different memory sizes ( e.g., 8GB, 11GB, and\\n24GB), and the results show that whereas SDXL cannot be\\nmounted on an 8GB GPU, our KOALA-700M can run on\\nit while still retaining decent image generation quality as\\nshown in Fig. 1. Our main contributions are as follows:\\n1. We design two efficient denoising U-Net architectures\\nwith model sizes (1.13B/782M) more than twice as small\\nas SDXL’s U-Net (2.56B).\\n2. We perform a comprehensive analysis of the knowledge\\ndistillation strategies for SDXL, finding four essential\\nfactors for feature distillation.\\n3. We build two efficient T2I models pre-trained by the\\nproposed KD, called KOALA-1B/700M, which is more\\nthan2×smaller and faster compared to SDXL-Base.\\n4. We perform a systemical analysis of inference on a vari-\\nety of GPUs, showing that our KOALA-700M can oper-\\nate on an economical GPU with 8 GB of memory.\\n2. Related Works\\nKnowledge distillation for efficient T2I diffusion mod-\\nels. Denoising diffusion models [20, 61] dominate the\\nrecent state-of-the-art text-to-image (T2I) diffusion mod-\\nels [10, 44, 46, 53] due to their unprecedented high qual-\\nity and diversity. However, a significant shortcoming of\\nthese models is their intensive computational demands dur-\\ning sampling time, which constrains their utility in practical\\nscenarios. To alleviate this problem, early efforts [37, 54]\\nhave focused on improving the sampling speed by reduc-\\ning the number of required sampling steps. In particular,\\nSalimans et al. [54] proposes a concept of step distillation ,\\nwhich trains a student model with fewer steps, distilled from\\na pre-trained diffusion model as a teacher model. Meng\\net al. [37] expand this concept to classifier-free guided dif-\\nfusion models, facilitating the step distillation for modern\\ntext-to-image diffusion models. Although these methods\\nsignificantly speed up the inference of models, the hardware\\nprerequisites still pose challenges to practitioners.\\nAs another line of research, BK-SDM [26] attempts\\nan architectural compression of diffusion models. They\\nfirst eliminate redundant network components to construct\\na shallow model. Then, a simple knowledge distilla-\\ntion method [15, 50] is employed to transfer the knowl-\\nedge from the original pre-trained diffusion model ( e.g.,\\nSDM-v1.4 [47]). Remarkably, this compressed model has\\nachieved substantial reductions in sampling time, GPU\\nmemory demands, and storage requirements, with only\\na modest degradation in performance. However, BK-\\nSDM’s simple block removal method has limitations in\\nDW1_R DW2_R DW2_T DW3_R DW3_T MID_R MID_T UP1_R UP1_T UP2_R UP2_T UP3_R\\nStage02004006008001000Parameter (M)SDXL (2.57B)\\nKOALA-1B (1.16B)\\nKOALA-700M (0.78B)Figure 2. Dissection of U-Net in SDXL. DW iand UP iindicate\\ni-thstage of the down and the up block, and R and T denote the\\nResidual block and Transformer block, respectively.\\nSDXL-Base Text Encoder [22, 43] V AE Decoder [4] U-Net\\n#Parameters 817M 83M 2,567M\\nLatency (s) 0.008 0.002 3.133\\nTable 1. SDXL-Base-1.0 model budget. Latency is measured\\nunder the image scale of 1024×1024 , FP16-precision, and 25\\ndenoising steps in NVIDIA 4090 GPU (24GB).\\nU-Net SDM-v2.0 SDXL-Base KOALA-1B KOALA-700M\\nParam. 865M 2,567M 1,161M 782M\\nCKPT size 3.46GB 10.3GB 4.4GB 3.0GB\\nTx blocks [1, 1, 1, 1] [0, 2, 10] [0, 2, 6] [0, 2, 5]\\nMid block ✓ ✓ ✓ ✗\\nLatency 1.131s 3.133s 1.604s 1.257s\\nTable 2. U-Net Comparison. Tx means Transformer. SDM-\\nv2.0 [48] uses 768×768 resolution, while SDXL and KOALA\\nmodels use 1024×1024 resolution. Latency is measured\\nwith FP16-precision, and 25 denoising steps in NVIDIA 4090\\nGPU (24GB). CKPT means the trained checkpoint file.\\ncompressing more complex and larger U-Net models such\\nas SDXL [41]. Beyond the BK-SDM method, which only\\ndistills the last feature at each stage, there is still room for\\nfurther exploration in distilling knowledge from more com-\\nplex U-Net in SDXL.\\n3. Analysis: Stable Diffusion XL\\nSDXL [41], the latest version of the SDM series [46–48],\\nexerts a significant influence on both the academic com-\\nmunity and the industry due to its unprecedented quality\\nand open source resources. It has several key improve-\\nment points from the previous SDM-v2.0 [48], e.g., multi-\\nple sizes- & crop-conditioning, improved V AE [27, 45], and\\nmuch larger U-Net [51], and an ad hoc style of refinement\\nmodule, which leads to significantly improving generation\\n3', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 2}),\n",
       " Document(page_content='CR1,1R1,2CDwR2,1Ts2,1R2,2Ts2,2CDwR3,1Ts3,1R3,2Ts3,2R4,1Ts4,1R4,2CR6,1Ts6,1R6,2Ts6,2R6,3Ts6,3CUpR7,1R7,2R7,3TeacherSDXL-Base(2.56B)StudentKOALA-1B(1.16B)KOALA-700M(782M)320×64×64CR1,1CDwR2,1Ts2,1CDwR3,1Ts3,1R4,1Ts4,1R4,2R5,1Ts5,1R5,2Ts5,2R5,3Ts5,3CUpCDW-1DW-2DW-3MidUP-1UP-2UP-3\\nDW-1DW-2DW-3MidR5,1Ts5,1R5,2Ts5,2CUpR6,1Ts6,1R6,2Ts6,2CUpUP-2R7,1R7,2UP-3CR1,1CDwR2,1Ts2,1CDwR3,1Ts3,1CDW-1DW-2DW-3R5,1Ts5,1R5,2Ts5,2CUpR6,1Ts6,1R6,2Ts6,2CUpUP-2R7,1R7,2640×32×321280×32×321280×32×321280×64×64640×128×128320×128×128320×128×128\\nUP-1UP-1UP-3RResBlockTsTransformerBlocks==Depth = 2TsTransformerBlocks=Depth = 6TsTransformerBlocks=Depth = 10TsTransformerBlocks=Depth = 5CCDwCUpConvolution Blocklatent!latent!PredictednoisePredictednoise4×128×128\\n4×128×1284×128×128\\n4×128×128Self-attention-basedknowledge distillation(Sec. 4.2)Figure 3. Overview of KnOwledge-DistillAtion in LAtent diffusion model based on SDXL and architecture of KOALA. We omit\\nskip connections for simplicity. We perform feature distillation in transformer blocks using the output of the self-attention layer.\\nquality. However, the significant enlargement of U-Net in\\nmodel size results in increased computational costs and sig-\\nnificant memory (or storage) requirements, hampering the\\naccessibility of SDXL. Thus, we investigate the U-Net in\\nSDXL regarding model size and latency to design a more\\nlightweight U-Net for knowledge distillation. We dissect\\nthe components of SDXL, quantifying its size and latency\\nduring the denoising phase, as detailed in Tab. 1. The en-\\nlarged U-Net (2.56B) is the primary cause of the increasing\\nSDXL model size (vs. SDM-v2.0 (865 M)). Furthermore,\\nthe latency of U-Net is the main inference time bottleneck in\\nSDXL. Therefore, it is necessary to reduce U-Net’s model\\nbudget for better efficiency.\\nThe SDXL’s U-Net architecture varies in the number of\\ntransformer blocks for each stage, unlike SDM-v2.0, which\\nemploys a transformer block for each stage (see Tab. 2). At\\nthe highest feature levels ( e.g.,DW-1&UP-3 in Fig. 3),\\nSDXL uses only residual blocks [14] without transformer\\nblocks, instead distributing more transformer blocks to\\nlower-level features. So, in Fig. 2, we analyze the parame-\\nter distribution of each stage in the U-Net. Most parameters\\nare concentrated on the transformers with ten blocks in the\\nlowest feature map ( e.g.,32×32ofDW-3 ,Mid,UP-1\\nin Fig. 3), making the main parameter bottleneck. Thus,\\nit is essential to address this bottleneck when designing an\\nefficient U-Net.\\n4. Approach\\nIn this section, we first propose a simple yet efficient U-Net\\narchitecture in Sec. 4.1. Then, we explore how to effectively\\ndistill the knowledge from U-Net in SDXL [41] into the\\nproposed efficient U-Net in Sec. 4.2.4.1. Efficient U-Net architecture\\nBased on the investigation of the U-Net model budget of\\nSDXL in Sec. 3, we propose a simple yet efficient U-Net\\narchitecture. BK-SDM [26] also aimed to compress the\\nU-Net of SDM-v1.4 [47] by removing a pair of a residual\\nblock and a transformer block at each stage. However, the\\nBK-SDM’s approach is only suitable for SDM-v1.4, which\\nhas only one transformer block ( i.e., depth=1) at each stage.\\nAs SDXL’s U-Net has a different number of transformer\\nblocks at each stage, as shown in Tab. 2, simple block-level\\nremoval (one block pair) can only reduce SDXL’s U-Net to\\nat most 1.3B model parameters.\\nIn this work, we devise a compressed U-Net that is\\nmore suitable for SDXL [41] compared to that of BK-\\nSDM [26]. Similar to BK-SDM, we first remove the\\nresidual-transformer blocks pair at each stage. Specifically,\\nin the encoder part ( DW-i ), each stage has two alternat-\\ning pairs of a residual block and transformer blocks. We\\nremove the last pair of residual-transformer blocks at each\\nstage. In the decoder part ( UP-i ), we remove the inter-\\nmediate pair of residual-transformer blocks. Furthermore,\\nfocusing on the fact that the majority of the parameters are\\nconcentrated on the transformer blocks at the lowest fea-\\ntures (Fig. 2), we reduce the depth of the transformer blocks\\nfrom 10 to 5 or 6 at the lowest features ( i.e.,DW-3 ,Mid\\nandUP-1 in Fig. 3). As a result, we design two types of\\ncompressed U-Net, KOALA-1B and KOALA-700M. More\\ndetails of the proposed U-Nets are demonstrated in Tab. 2\\nand Fig. 3. Note that we remove Mid block in KOALA-\\n700M for additional model compression. Our KOALA-1B\\nmodel has 1.16B parameters, making it twice as compact as\\nSDXL (2.56B). Meanwhile, KOALA-700M, with its 782M\\nparameters, is comparable in size to SDM-v2.0 (865M). It\\nis noteworthy that KOALA-700M achieves almost twice the\\n4', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 3}),\n",
       " Document(page_content='Distill type HPSv2\\nSD-loss 25.53\\nSA 26.74\\nCA 26.11\\nRes 26.27\\nFFN 26.48\\nLF (BK [26]) 26.63\\n(a)Distillation type.Distill loc. HPSv2\\nSD-loss 25.53\\nDW-2 25.32\\nDW-3 25.57\\nMid 25.66\\nUP-1 26.52\\nUP-2 26.05\\n(b)Distill stage. We distill the SA\\nfeature at only each stage.SA loc. HPSv2\\nSA-bottom 26.74\\nSA-interleave 26.58\\nSA-up 26.48\\n(c)SA locations to distill from a\\ntransformer block with a depth of\\n10 in Teacher U-Net in SDXL.Combination HPSv2\\nBaseline (SA only) 26.74\\nSA + LF at DW-1 & UP-3 26.98\\nSA + Res at DW-1 & UP-3 26.94\\nSA + LF all 26.83\\nSA + Res all 26.80\\nSA+CA+Res+FFN+LF all 26.39\\n(d)Combination. DW-1 & UP-3 are the\\nhighest feature resolution in U-Net.\\nTable 3. Analysis of feature level knowledge distillation of U-Net in SDXL [41]. SA, CA, and FFN denote self-attention, cross-attention,\\nand feed-forward net in the transformer block. Res is a convolutional residual block and LF denotes the last feature (same in BK [26]). For\\nthe ablation study, we train our KOALA-1B as student U-Net for 30K iterations with a batch size of 32.\\nfaster inference speed over SDXL, as well as comparable to\\nSDM-v2.0, which generates lower-resolution images.\\n4.2. Exploring Knowledge distillation for SDXL\\nNow we explore how to effectively distill the knowledge\\nof U-Net in SDXL [41] into the proposed compact U-Net\\ndescribed in Sec. 4.1. As a latent diffusion model [46],\\nSDXL encodes input x∈R3×1024×1024to latent represen-\\ntation z∈R4×128×128via V AE [27, 45] to reduce compu-\\ntation cost for high-resolution generation. The latent rep-\\nresentation zis then fed into the U-Net [51] to predict the\\nnoise ( ϵ), which is the most essential part of generating the\\nimage and also the most computationally intensive. We re-\\nplace this U-Net in SDXL with the proposed efficient U-\\nNet and then train the replaced U-Net through knowledge-\\ndistillation-based learning.\\nKim et al. [26] adopted knowledge distillation for text-\\nto-image pre-training by using the U-Net in SDM-v1.4 [47]\\nas a teacher model ( Tθ) through the following objectives\\nLtask+LoutKD +LfeatKD :\\nLtask= min\\nSθEzt,ϵ,t,c||ϵt−ϵSθ(zt, t, c)||2\\n2, (1)\\nLoutKD = min\\nSθEz,ϵ,t,c||ϵTθ(z, t, c )−ϵSθ(z, t, c )||2\\n2,(2)\\nLfeatKD = min\\nSθEz,ϵ,c,t||X\\nifi\\nT(zt, t, c)−fi\\nS(zt, t, c)||2\\n2,\\n(3)\\nwhere Sθis the compressed U-Net as a student model, ϵt\\nis the ground-truth sampled Gaussian noise at timestep t,c\\nis text embedding as a condition, ϵSθ(·)andϵTθ(·)denote\\nthe predicted noise from each U-Net in teacher and student\\nmodel, respectively. Ltaskis the task loss for the reverse de-\\nnoising process [20], Loutis the output-KD loss [18] com-\\nputed between the predicted noises from teacher and student\\nrespectively, and Lfeatis the feature-wise KD loss [15, 50]\\ncomputed between the last features fi\\nT(·)andfi\\nT(·)ati-\\nstage from teacher and student models, respectively.The feature-wise distillation literature [6, 13, 15, 39, 50]\\nshows that intermediate features play a more critical role\\nin knowledge distillation than output KD [18]. For the\\nfeature-wise KD-loss, BK-SDM [26] considers only the\\nlast feature map at each stage. However, the denois-\\ning U-Net consists of several types of features, such as\\nself-attention ( SA), cross-attention ( CA), and feedforward\\nnet (FFN) in the transformer block, and convolutional resid-\\nual block ( Res). This means that the feature distillation\\napproach for text-to-image diffusion models has not been\\nsufficiently explored , leaving room for further investigation.\\nIn this work, we have performed an in-depth analysis of\\nfeature distillation in the U-Net of SDXL [41] as shown\\nin Tab. 3 and observed four important findings . To this\\nend, we ablate feature distillation strategies by using our ef-\\nficient U-Net (KOALA-1B) as the student model and the\\nU-Net of SDXL as the teacher model. More training details\\nare described in Sec. 5.1. We start from a baseline trained\\nonly by Ltaskand add LfeatKD without LoutKD to validate the\\neffect of feature distillation.\\nF1. Which feature type is effective for distillation? BK-\\nSDM [26] demonstrated that distilling the last features ( LF)\\nat U-Net stages benefits overall performance, when applied\\nto shallow U-Net of early SDM-v1.4 [47]. However, with\\nthe increasing complexity of U-Net and its stage, relying\\nsolely on LFmay not be sufficient to mimic the intricate\\nbehavior of the teacher U-Net. To this end, we revisit\\nwhich features provide the richest guidance for effective\\nknowledge distillation. We focus on key intermediate fea-\\ntures from each stage: outputs from the SA,CA, and FFN\\nlayers in the transformer block, as well as outputs from\\nRes andLF. Tab. 3a summarizes the experimental results.\\nWhile all types of features help obtain higher performance\\nover the na ¨ıve baseline with only the task loss, distilling\\nself-attention features achieves the most performance gain.\\nConsidering the prior studies [29, 60, 62] which suggest that\\nSAplays a vital role in capturing semantic affinities and the\\noverall structure of images, the results emphasize that such\\ninformation is crucial for the distillation process.\\n5', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 4}),\n",
       " Document(page_content='LF (BK) SA (Ours)\\n(a) Generated Image (b) PCA Analysis\\n (c) Self-attention visualizationDW2 DW3 MID UP1 UP2“a big hippopotamus and a small rabbit.” \\nFigure 4. Analysis on self-attention maps of distilled student U-Nets. (a) Generated images of LF- and SA-based distilled models,\\nwhich are BK-SDM [26] and our proposal, respectively. In BK-SDM’s result, a rabbit is depicted like a hippopotamus ( i.e., appearance\\nleakage). (b) Visualization of PCA analysis results on self-attention maps of UP-1 stage. (c) Representative visualization of self-attention\\nmap from different U-Net stages. Red boxes denote the query patches. Note that from the MID stage, the SA-based model attends to the\\nrabbit more discriminatively than the LFmodel, demonstrating that self-attention-based KD allows to generate objects more distinctly.\\nTo understand the effects more clearly, we illustrate a\\nrepresentative example in the Fig. 4. To reason about\\nhow the distilled student U-Net captures self-similarity, we\\nperform a PCA analysis [23, 63] on self-attention maps.\\nSpecifically, we apply PCA on self-attention maps from SA-\\nandLF-based models and show the top three principal com-\\nponents in Fig. 4-(b). Interestingly, in the SA-based model,\\neach principal component distinctly represents individual\\nobjects ( i.e., unique color assignments to each object). This\\nindicates that the SA-based model effectively distinguishes\\ndifferent objects in modeling self-similarity, which plays a\\ncrucial role in accurately rendering the distinct appearance\\nof each object. In contrast, the LF-based model exhibits less\\ndistinction between objects, resulting in appearance leak-\\nagebetween them ( e.g., small hippo with rabbit ears).\\nF2. Which stage is most effective for distillation? Based\\non the first finding (F1), we further explore the role and\\nsignificance of each self-attention stage. To this end, we\\nfirst visualize the self-attention map in Fig. 4-(c). The\\nself-attention maps initially capture general contextual in-\\nformation ( e.g.,DW-2&DW-3 ) and gradually focus on lo-\\ncalized semantics ( e.g.,MID). In the decoder, self-attentions\\nincreasingly correlate with higher-level semantics ( e.g., ob-\\nject) to accurately model appearances and structures. No-\\ntably, in this stage, the SA-based model attends correspond-\\ning object regions (given the query patch, red box) more\\ndiscriminatively than the LF-based model, which results in\\nimproved compositional image generation performance.\\nIn addition, we ablate the significance of each self-\\nattention stage in the distillation process. Specifically, we\\nadopt an SA-based loss at a single stage alongside the task\\nloss. As shown in Tab. 3b, the results align with the aboveunderstanding: distilling self-attention knowledge within\\nthedecoder stages significantly enhances generation qual-\\nity. In comparison, the impact of self-attention solely within\\nthe encoder stages is less pronounced. Consequently, we\\nopt to retain more SAlayers within the decoder (see Fig. 3).\\nF3. Which SA’s location is effective in the trans-\\nformer blocks? At the lowest feature level, the depth of\\nthe transformer blocks is 6 for KOALA-1B, so we need\\nto decide which locations to distill from the 10 trans-\\nformer blocks of teacher U-Net. We assume three cases\\nfor each series of transformer blocks; (1) SA-bottom :\\n{fl\\nT|l∈ {1,2,3,4,5}}, (2)SA-interleave :{fl\\nT|\\nl∈ {1,3,5,7,9,10}}, and (3) SA-up :{fl\\nT|l∈\\n{6,7,8,9,10}}where lis the number of block. Tab. 3c\\nshows that SA-bottom performs the best while SA-up\\nperforms the worst. This result suggests that the features of\\nthe early blocks are more significant for distillation. A more\\nempirical analysis is described in Appendix B.2. Therefore,\\nwe adopt the SA-bottom strategy in all experiments.\\nF4. Which combination is the best? In SDXL’s U-Net,\\nas shown in Fig. 3, there are no transformer blocks at the\\nhighest feature levels ( e.g.,DW-1&UP-3 ); consequently,\\nself-attention features cannot be distilled at this stage.\\nThus, we try two options: the residual block ( Res at\\nDW-1&UP-3 ) and the last feature ( LF at DW-1&UP-3 )\\nas BK-SDM [26]. To this end, we perform SA-based fea-\\nture distillation at every stage except for DW-1 andUP-3 ,\\nwhere we use the above two options, respectively. In\\naddition, we try additional combinations: SA+LF all ,\\nSA+Res all , and SA+CA+Res+FFN+LF all where\\nall means all stages). Tab. 3d demonstrates that adding\\nmore feature distillations to the SA-absent stage ( e.g.,\\n6', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 5}),\n",
       " Document(page_content='Model#Param. HPSv2 Attribute Object RelationshipComplex Average\\nWhole/U-Net Anime Concept-art Paintings Photo Average Color Shape Texture Spatial Non-spatial\\nSDM-v1.4 [47] 1.04B 860M 27.26 26.61 26.66 27.27 26.95 0.3765 0.3576 0.4156 0.1246 0.3079 0.308 0.3150\\nSDM-v2.0 [48] 1.28B 865M 27.48 26.89 26.86 27.27 27.13 0.5065 0.4221 0.4922 0.1342 0.3096 0.3386 0.3672\\nDALLE-2 [44] 6.5B - 27.34 26.54 26.68 27.24 26.95 0.5750 0.5464 0.6374 0.1283 0.3043 0.3696 0.4268\\nSDXL-Base-1.0 [41] 3.46B 2.6B 27.69 27.44 27.50 28.29 27.73 0.6369 0.5408 0.5637 0.2032 0.3110 0.4091 0.4441\\nBK-SDM-S [26] 655M 483M 26.64 26.77 26.87 26.61 26.72 0.3984 0.3783 0.4225 0.0731 0.3003 0.3695 0.3237\\nOurs -SDM-S 655M 483M 26.73 26.95 27.00 26.74 26.86 0.4386 0.3950 0.4549 0.0832 0.3007 0.3777 0.3417\\nBK-SDM-B [26] 752M 580M 27.01 26.64 27.06 26.63 26.84 0.4192 0.4096 0.4409 0.0979 0.3077 0.3052 0.3301\\nOurs -SDM-B 752M 580M 26.79 27.12 27.11 26.79 26.95 0.4436 0.4338 0.4680 0.1077 0.3090 0.3872 0.3582\\nBK-SDXL-700M 1.68B 782M 27.59 27.13 27.17 27.14 27.26 0.5202 0.4506 0.4564 0.1360 0.3008 0.3699 0.3723\\nKOALA-700M 1.68B 782M 27.65 27.28 27.58 27.21 27.43 0.5068 0.4731 0.4674 0.1535 0.3008 0.3731 0.3791\\nBK-SDXL-1B 2.06B 1.16B 27.52 26.90 27.17 26.87 27.12 0.4876 0.4498 0.4578 0.1551 0.3035 0.3777 0.3719\\nKOALA-1B 2.06B 1.16B 27.73 27.26 27.61 27.16 27.44 0.5223 0.5108 0.4864 0.1563 0.3019 0.3697 0.3912\\nTable 4. Visual aesthetics evaluation using HPSv2 [67] (Left) and Image-text alignment evaluation using T2I-CompBench [21] (Right).\\nNote that BK-SDXL-1B/700M are implemented by ourselves using our efficient U-Net with the BK-SDM [26]’s distillation method. For\\nfair comparison with other methods, we use 50 denoising steps with Euler discrete scheduler [24] same as the huggingface’s SDXL-Base-\\n1.0 model [66].\\nDW-1&UP-3 ) consistently boots performance, and espe-\\ncially LF at DW1&UP3 shows the best. Interestingly,\\nboth+LF all and+Res all are worse than the ones\\nat only DW-1 &UP-3 andSA+CA+Res+FFN+LF all is\\nalso not better, demonstrating that the SA features are not\\ncomplementary to the other features.\\nWith these findings, we build a KnOwledge-distill Ation-\\nbased LAtent diffusion model with our efficient U-Net,\\ncalled KOALA. We train our KOALA models with the fol-\\nlowing objectives: Ltask+LoutKD +LfeatKD where we ap-\\nply our findings to LfeatKD . As shown in Tab. 2, we de-\\nsign two models, KOALA-1B and KOALA-700M, based\\non SDXL [41], with U-Net model sizes of 1.16B and 782M,\\nrespectively.\\n5. Experiments\\n5.1. Implementation details\\nDataset. Since the dataset used to train SDXL [41] is not\\npublicly available ( i.e., internal data), we train the proposed\\nefficient U-Net in SDXL on publicly available LAION-\\nAesthetics V2 6+ [55, 57, 58] for reproducibility. As the\\ndataset contains some blank text and corrupted images, we\\nfilter the data and collect 8,483,623 image-text pairs.\\nTraining. We use the officially released SDXL-Base-\\n1.0 [40] and the training settings, while only replacing\\nits U-Net with our efficient U-Net. We use the same\\ntwo text encoders used in SDXL, which are OpenCLIP\\nViT-bigG [22] and CLIP ViT-L [43]. For V AE, we use\\nsdxl-vae-fp16-fix [4], which enables us to use FP16\\nprecision for V AE computation. We initialize the weights\\nof our U-Net with the teacher’s U-Net weights at the same\\nblock location. We train our KOALA models for 100K\\niterations with a batch size of 128 using four NVIDIA\\nA100 (80GB) GPUs. More details are described in A. For\\na fair comparison to our counterpart BK-SDM [26], we\\ntrain our efficient U-Nets with their distillation method un-\\nder the same data setup ( e.g., BK-SDXL-1B and -700Min Tab. 4). Furthermore, we also train SDM-Base and SDM-\\nSmall in BK-SDM [26] with our approach (Ours-SDM-\\nBase & Ours-SDM-Small in Tab. 4), following the BK-\\nSDM training recipe.\\nEvaluation metric. Recently, several works [2, 41, 67,\\n68] have claimed that FID [17] is not closely correlated\\nwith visual fidelity because a feature extractor for FID\\nis pre-trained on the ImageNet [11] dataset, which does\\nnot overlap much with the datasets used to train recent\\ntext-to-image models ( e.g., style, types, resolution, etc.).\\nTherefore, instead of FID, we use Human Preference\\nScore (HPSv2) [67] as a visual aesthetics metric, which\\nallows us to evaluate visual quality in terms of more spe-\\ncific types. For image-text alignment, we use the T2I-\\ncompbench [21], which is a more comprehensive bench-\\nmark for evaluating the compositional text-to-image gener-\\nation capability than the single CLIP score [16].\\n5.2. Main results\\nVisual aesthetics. We compare our KOALA-700M/1B\\nmodels against state-of-the-art text-to-image models, in-\\ncluding popular open-sourced Stable diffusion models se-\\nries [41, 47, 48] and DALLE-2 [44]. Tab. 4 summarizes the\\nresults. Our KOALA-700M & KOALA-1B models based\\non SDXL [41] consistently achieve a higher HPS average\\nscore than the BK [26] models (BK-SDXL-700M & 1B)\\nequipped with our efficient U-Net. Moreover, for SDM,\\nOurs-SDM-Base & Smalll models using BK’s compressed\\nU-Net in SDM-v1.4 [47] still outperform BK-SDM-Base &\\nSmalll [26]. These results demonstrate that the proposed\\ndistillation of the self-attention layer is more helpful for\\nvisual aesthetics than the last layer feature distillation by\\nBK [26]. In addition, our KOALA models achieve a higher\\nquality score than DALLE-2 [44], which has a much larger\\nmodel size (6.5B). Furthermore, our KOALA-700M sur-\\npasses SDM-v2.0 [48] with a comparable U-Net size, which\\nis widely used in the community. In Appendix C, we pro-\\nvide the qualitative comparisons to DALLE-2, SDM-v2.0,\\n7', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 6}),\n",
       " Document(page_content='OOM/OOMOOM/6.3GBOOM/7.2GBOOM/7.7GBOOM/10.5GB9.2GB/6.1GBOOM/9.7GB10.8GB/8.9GB22.8GB/13.0GB10.8GB/6.6GB17.3GB/10.2GB15.7GB/9.5GBMemory:Figure 5. Latency and memory usage comparison on different GPUs : NVIDIA 3060Ti (8GB), 2080Ti (11GB), and 4090 (24GB). OOM\\nmeans Out-of-Memory . We measure the inference time of SDM-v2.0 with 768×768resolution and the other models with 1024×1024 .\\nWe use 25 denoising steps and FP16/FP32 precisions. Note that SDXL-Base cannot operate in the 8GB-GPU.\\nLimitations (Ours-700M)\\nA 3d art character of a cute kitty holding a sign that says\"Let there be peace\", pixar styleA baby penguinwearing a blue hatand red gloves\\nFigure 6. Failure cases of KOALA-700M\\nand SDXL-Base-1.0, supporting the quantitative results.\\nImage-text alignment. As shown in Tab. 4 (Right), our ap-\\nproach with both SDXL and SDM consistently surpasses\\nthe counterpart BK method [26] in terms of text-image\\nalignment. We conjecture that this is because our self-\\nattention-based KD approach allows the model to learn\\nmore discriminative representations between objects or at-\\ntributes, as demonstrated in Sec. 4.2 and Fig. 4. Mean-\\nwhile, unlike the aesthetics results (HPSv2), our models\\nlag behind DALLE-2 regarding the average score of the\\nCompBench. In attribute binding (color, shape, and tex-\\nture), our model lags behind DALLE-2 but outperforms in\\nobject-relationship metrics. We speculate that the different\\ntendency between DALLE-2 and our model may stem from\\ndata used for training. Because the LAION-Aesthetics V2\\n6+ [57] data we used focuses on higher aesthetic images\\nthan multiple objects with various attributes, our model is\\nvulnerable to texts with different attribute properties.\\n5.3. Model budget comparison\\nWe further validate the efficiency of our model by mea-\\nsuring its inference speed on a variety of consumer-gradeGPUs with different memory sizes, such as 8GB (3060Ti),\\n11GB (2080Ti), and 24GB (4090), because the GPU en-\\nvironment varies for each user. Fig. 5 illustrates infer-\\nence speed and GPU memory usage on different GPUs\\nwith both FP16 and FP32 precisions. For this experiment,\\nwe compare against the most popular open-sourced mod-\\nels, SDM-v2.0 [48] and SDXL-Base-1.0 [41]. On the 8GB\\nGPU, SDXL does not fit , but the other models can run in\\nFP16 precision. Notably, KOALA-700M generates higher-\\nresolution images of superior quality at a comparable in-\\nference speed to SDM-v2.0. On the 11GB GPU, SDXL\\ncan run with FP16 precision, and on the 24 GB, it can run\\nat 9.66s and 3.3s with both FP16 and FP32 precision, re-\\nspectively. On the other hand, our KOALA-700M runs at\\n3.94s and 1.42s, which is 2×faster than SDXL. Overall,\\nour KOALA-700M is the best alternative for high-quality\\nimage generation that can replace SDM-v2.0 and SDXL in\\nresource-constrained GPU environments.\\n6. Limitations and Future Work\\nWhile our KOALA models generate images with impres-\\nsive aesthetic quality, such as the photo-realistic or 3d-art\\nrenderings shown in Fig. 1, it still shows limitations in sev-\\neral specific cases:\\nRendering long legible text. Our models have difficulty in\\nsynthesizing legible texts in the generated image. For exam-\\nple, it renders unintended letters or generates unintelligible\\nletters, as shown in Fig. 6 (Left).\\nComplex prompt with multiple attributes. When at-\\ntempting to compose an image using prompts that include\\nvarious attributes of an object or scene, KOALA sometimes\\ngenerates instances that do not perfectly follow the intended\\ndescription. For example, as shown in Fig. 6 (Right), when\\nwe configure the penguin to wear a blue hat and red gloves,\\nonly the blue hat attribute is applied, while the red gloves\\nare not.\\n8', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 7}),\n",
       " Document(page_content='We conjecture that these limitations may stem from the\\ndataset, LAION-aesthetics-V2 6+ [57], we used to train,\\nwhose text prompts are relatively shorter (lacking detail)\\nand messy ( e.g., HTML code). Recent works [1, 69] also\\npointed out this issue and showed that utilizing machine-\\ngenerated detailed captions ( i.e., synthesized captions) im-\\nproves the fine-grained text-alignment of T2I models. For\\nfuture work, it will also be interesting to see the synergies\\nbetween our efficient T2I model and such large multimodal\\nmodels [5, 33, 34]-based recaptioning techniques. More\\nfailure cases are illustrated in Appendix C.4.\\n7. Conclusion\\nIn this work, we propose KOALA, an efficient text-to-image\\nsynthesis model, offering a compelling alternative between\\nSDM-v2.0 and SDXL in resource-limited environments. To\\nachieve this, we devise more compact U-Nets by effectively\\ncompressing the main computational bottlenecks present\\nin SDXL. In doing so, we demonstrate that self-attention-\\nbased knowledge distillation is one of the most crucial com-\\nponents to enhance the quality of generated images. With\\nthese contributions, our KOALA-700M model substantially\\nreduces the model size (69% ↓) and the latency (60% ↓) of\\nSDXL, while exhibiting decent aesthetic generation quality.\\n8. Acknowledgments\\nThis work was supported by Institute of Information &\\ncommunications Technology Planning & Evaluation (IITP)\\ngrant funded by the Korea government (MSIT) (No. RS-\\n2022-00187238, Development of Large Korean Language\\nModel Technology for Efficient Pre-training).\\nReferences\\n[1] James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng\\nWang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee,\\nYufei Guo, Wesam Manassra, Prafulla Dhariwal, Casey Chu,\\nYunxin Jiao, and Aditya Ramesh. Improving image genera-\\ntion with better captions. https://cdn.openai.com/\\npapers/dall-e-3.pdf , 2023. 9\\n[2] Eyal Betzalel, Coby Penso, Aviv Navon, and Ethan Fe-\\ntaya. A study on the evaluation of generative models. arXiv\\npreprint arXiv:2206.10935 , 2022. 2, 7\\n[3] Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dock-\\nhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis.\\nAlign your latents: High-resolution video synthesis with la-\\ntent diffusion models. In CVPR , 2023. 1\\n[4] Ollin Boer Bohan. Sdxl-vae-fp16-fix. https :\\n//huggingface.co/madebyollin/sdxl- vae-\\nfp16-fix , 2023. 3, 7, 12\\n[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub-\\nbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan-\\ntan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan-\\nguage models are few-shot learners. Advances in neural in-\\nformation processing systems , 33:1877–1901, 2020. 9[6] Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, and\\nDacheng Tao. Learning student networks via feature embed-\\nding. IEEE Transactions on Neural Networks and Learning\\nSystems , 32(1):25–35, 2020. 5\\n[7] Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang,\\nChuo-Ling Chang, Andrei Kulik, and Matthias Grundmann.\\nSpeed is all you need: On-device acceleration of large\\ndiffusion models via gpu-aware optimizations. In CVPR-\\nWorkshop , 2023. 1\\n[8] Jooyoung Choi, Yunjey Choi, Yunji Kim, Junho Kim, and\\nSungroh Yoon. Custom-edit: Text-guided image editing with\\ncustomized diffusion models. In CVPR-Workshop , 2023. 1\\n[9] Benjamin Consolvo. Text-to-image stable diffusion with sta-\\nbility ai and compvis on the latest intel gpu. https://\\nmedium.com/intel-analytics-software , 2022.\\n1\\n[10] DeepFloyd. Deepfloyd. https://www.deepfloyd.\\nai/, 2022. 3\\n[11] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,\\nand Li Fei-Fei. Imagenet: A large-scale hierarchical image\\ndatabase. In CVPR , 2009. 7\\n[12] Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik,\\nAmit Haim Bermano, Gal Chechik, and Daniel Cohen-or.\\nAn image is worth one word: Personalizing text-to-image\\ngeneration using textual inversion. In ICLR , 2023. 1\\n[13] Jianping Gou, Baosheng Yu, Stephen J Maybank, and\\nDacheng Tao. Knowledge distillation: A survey. Interna-\\ntional Journal of Computer Vision , 2021. 5\\n[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.\\nDeep residual learning for image recognition. In CVPR ,\\n2016. 4\\n[15] Byeongho Heo, Jeesoo Kim, Sangdoo Yun, Hyojin Park, No-\\njun Kwak, and Jin Young Choi. A comprehensive overhaul\\nof feature distillation. In ICCV , 2019. 1, 3, 5\\n[16] Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras,\\nand Yejin Choi. Clipscore: A reference-free evaluation met-\\nric for image captioning. In EMNLP , 2021. 7\\n[17] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner,\\nBernhard Nessler, and Sepp Hochreiter. Gans trained by a\\ntwo time-scale update rule converge to a local nash equilib-\\nrium. In NeurIPS , 2017. 2, 7\\n[18] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill-\\ning the knowledge in a neural network. arXiv preprint\\narXiv:1503.02531 , 2015. 1, 5\\n[19] Jonathan Ho and Tim Salimans. Classifier-free diffusion\\nguidance. arXiv preprint arXiv:2207.12598 , 2022. 12, 13,\\n15, 16, 17, 18\\n[20] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu-\\nsion probabilistic models. In NeurIPS , 2020. 3, 5, 12\\n[21] Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, and Xi-\\nhui Liu. T2i-compbench: A comprehensive benchmark\\nfor open-world compositional text-to-image generation. In\\nNeurIPS , 2023. 2, 7\\n[22] Gabriel Ilharco, Mitchell Wortsman, Nicholas Carlini,\\nRohan Taori, Achal Dave, Vaishaal Shankar, Hongseok\\nNamkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi,\\nand Ludwig Schmidt. Openclip. https://github.\\ncom/mlfoundations/open_clip , 2021. 3, 7, 12\\n9', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 8}),\n",
       " Document(page_content='[23] Ian T Jolliffe and Jorge Cadima. Principal component analy-\\nsis: a review and recent developments. Philosophical trans-\\nactions of the royal society A: Mathematical, Physical and\\nEngineering Sciences , 374(2065):20150202, 2016. 6, 12\\n[24] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine.\\nElucidating the design space of diffusion-based generative\\nmodels. NeurIPS , 2022. 2, 7, 12, 13, 15, 16, 17, 18\\n[25] Levon Khachatryan, Andranik Movsisyan, Vahram Tade-\\nvosyan, Roberto Henschel, Zhangyang Wang, Shant\\nNavasardyan, and Humphrey Shi. Text2video-zero: Text-\\nto-image diffusion models are zero-shot video generators. In\\nICCV , 2023. 1\\n[26] Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and\\nShinkook Choi. On architectural compression of text-to-\\nimage diffusion models. arXiv preprint arXiv:2305.15798 ,\\n2023. 1, 2, 3, 4, 5, 6, 7, 8, 12, 13, 14\\n[27] Diederik P Kingma and Max Welling. Auto-encoding varia-\\ntional bayes. arXiv preprint arXiv:1312.6114 , 2013. 3, 5\\n[28] Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Ma-\\ntiana, Joe Penna, and Omer Levy. Pick-a-pic: An open\\ndataset of user preferences for text-to-image generation. In\\nICCV , 2023. 2\\n[29] Nicholas Kolkin, Jason Salavon, and Gregory\\nShakhnarovich. Style transfer by relaxed optimal transport\\nand self-similarity. In CVPR , 2019. 5\\n[30] Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli\\nShechtman, and Jun-Yan Zhu. Multi-concept customization\\nof text-to-image diffusion. In CVPR , 2023. 1\\n[31] Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys,\\nYun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. Snap-\\nfusion: Text-to-image diffusion model on mobile devices\\nwithin two seconds. arXiv preprint arXiv:2306.00980 , 2023.\\n1\\n[32] Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa,\\nXiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler,\\nMing-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution\\ntext-to-3d content creation. In CVPR , 2023. 1\\n[33] Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee.\\nImproved baselines with visual instruction tuning, 2023. 9\\n[34] Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee.\\nVisual instruction tuning, 2023. 9\\n[35] Ilya Loshchilov and Frank Hutter. Decoupled weight decay\\nregularization. arXiv preprint arXiv:1711.05101 , 2017. 12\\n[36] Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan\\nLi, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion\\nprobabilistic model sampling in around 10 steps. Advances\\nin Neural Information Processing Systems , 35:5775–5787,\\n2022. 14\\n[37] Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik\\nKingma, Stefano Ermon, Jonathan Ho, and Tim Salimans.\\nOn distillation of guided diffusion models. In CVPR , 2023.\\n1, 3\\n[38] Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian\\nZhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. T2i-\\nadapter: Learning adapters to dig out more controllable\\nability for text-to-image diffusion models. arXiv preprint\\narXiv:2302.08453 , 2023. 1[39] Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, and\\nQun Liu. Alp-kd: Attention-based layer projection for\\nknowledge distillation. In AAAI , 2021. 5\\n[40] Dustin Podell, Zion English, Kyle Lacey, Andreas\\nBlattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna,\\nand Robin Rombach. Stable-diffusion-xl-base-1.0. https:\\n/ / huggingface . co / stabilityai / stable -\\ndiffusion-xl-base-1.0 , 2023. 7, 12\\n[41] Dustin Podell, Zion English, Kyle Lacey, Andreas\\nBlattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and\\nRobin Rombach. Sdxl: Improving latent diffusion mod-\\nels for high-resolution image synthesis. arXiv preprint\\narXiv:2307.01952 , 2023. 1, 2, 3, 4, 5, 7, 8, 12, 13\\n[42] Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden-\\nhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv\\npreprint arXiv:2209.14988 , 2022. 1\\n[43] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya\\nRamesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry,\\nAmanda Askell, Pamela Mishkin, Jack Clark, et al. Learn-\\ning transferable visual models from natural language super-\\nvision. In ICML , 2021. 3, 7, 12\\n[44] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu,\\nand Mark Chen. Hierarchical text-conditional image gen-\\neration with clip latents. arXiv preprint arXiv:2204.06125 ,\\n2022. 3, 7, 13\\n[45] Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wier-\\nstra. Stochastic backpropagation and approximate inference\\nin deep generative models. In ICML , 2014. 3, 5\\n[46] Robin Rombach, Andreas Blattmann, Dominik Lorenz,\\nPatrick Esser, and Bj ¨orn Ommer. High-resolution image syn-\\nthesis with latent diffusion models. In CVPR , 2022. 1, 3, 5\\n[47] Robin Rombach, Andreas Blattmann, Dominik Lorenz,\\nPatrick Esser, and Bj ¨orn Ommer. Stable-diffusion-\\nv1.4. https://github.com/CompVis/stable-\\ndiffusion , 2022. 1, 3, 4, 5, 7\\n[48] Robin Rombach, Andreas Blattmann, Dominik Lorenz,\\nPatrick Esser, and Bj ¨orn Ommer. Stable-diffusion-\\nv2.0. https://github.com/Stability- AI/\\nstablediffusion , 2022. 1, 3, 7, 8\\n[49] Robin Rombach, Andreas Blattmann, Dominik Lorenz,\\nPatrick Esser, and Bj ¨orn Ommer. Stable-diffusion-\\nv2.0. https://huggingface.co/stabilityai/\\nstable-diffusion-2 , 2022. 13, 15, 16, 17, 18\\n[50] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou,\\nAntoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets:\\nHints for thin deep nets. arXiv preprint arXiv:1412.6550 ,\\n2014. 3, 5\\n[51] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-\\nnet: Convolutional networks for biomedical image segmen-\\ntation. In Medical Image Computing and Computer-Assisted\\nIntervention–MICCAI 2015: 18th International Conference,\\nMunich, Germany, October 5-9, 2015, Proceedings, Part III\\n18, pages 234–241. Springer, 2015. 3, 5\\n[52] Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch,\\nMichael Rubinstein, and Kfir Aberman. Dreambooth: Fine\\ntuning text-to-image diffusion models for subject-driven\\ngeneration. In CVPR , 2023. 1, 14\\n10', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 9}),\n",
       " Document(page_content='[53] Chitwan Saharia, William Chan, Saurabh Saxena, Lala\\nLi, Jay Whang, Emily L Denton, Kamyar Ghasemipour,\\nRaphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans,\\net al. Photorealistic text-to-image diffusion models with deep\\nlanguage understanding. 2022. 3\\n[54] Tim Salimans and Jonathan Ho. Progressive distillation for\\nfast sampling of diffusion models. In ICLR , 2022. 1, 3\\n[55] Christoph Schuhmann, Romain Beaumont, Richard Vencu,\\nCade Gordon, Ross Wightman, Mehdi Cherti, Theo\\nCoombes, Aarush Katta, Clayton Mullis, Mitchell Worts-\\nman, et al. Laion-aestheics v2. https://laion.ai/\\nblog/laion-aesthetics/ , 2022. 7\\n[56] Christoph Schuhmann, Romain Beaumont, Richard\\nVencu, Cade Gordon, Ross Wightman, Mehdi Cherti,\\nTheo Coombes, Aarush Katta, Clayton Mullis,\\nMitchell Wortsman, et al. Laion-aesthetics v2\\n6.5+. https://huggingface.co/datasets/\\nChristophSchuhmann/improved_aesthetics_\\n6.5plus , 2022. 12\\n[57] Christoph Schuhmann, Romain Beaumont, Richard\\nVencu, Cade Gordon, Ross Wightman, Mehdi Cherti,\\nTheo Coombes, Aarush Katta, Clayton Mullis,\\nMitchell Wortsman, et al. Laion-aesthetics v2\\n6+. https : / / huggingface . co / datasets /\\nChristophSchuhmann/improved_aesthetics_\\n6plus , 2022. 2, 7, 8, 9, 12\\n[58] Christoph Schuhmann, Romain Beaumont, Richard Vencu,\\nCade Gordon, Ross Wightman, Mehdi Cherti, Theo\\nCoombes, Aarush Katta, Clayton Mullis, Mitchell Worts-\\nman, et al. Laion-5b: An open large-scale dataset for training\\nnext generation image-text models. In NeurIPS , 2022. 7\\n[59] Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and\\nYan Yan. Post-training quantization on diffusion models. In\\nCVPR , 2023. 1\\n[60] Eli Shechtman and Michal Irani. Matching local self-\\nsimilarities across images and videos. In CVPR , 2007. 5\\n[61] Jiaming Song, Chenlin Meng, and Stefano Ermon.\\nDenoising diffusion implicit models. arXiv preprint\\narXiv:2010.02502 , 2020. 3, 12, 13, 15, 16, 17, 18\\n[62] Narek Tumanyan, Omer Bar-Tal, Shai Bagon, and Tali\\nDekel. Splicing vit features for semantic appearance transfer.\\nInCVPR , 2022. 5\\n[63] Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali\\nDekel. Plug-and-play diffusion features for text-driven\\nimage-to-image translation. In CVPR , 2023. 1, 6\\n[64] Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro\\nCuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj,\\nand Thomas Wolf. Diffusers: State-of-the-art diffusion\\nmodels. https://github.com/huggingface/\\ndiffusers , 2022. 12\\n[65] Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro\\nCuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj,\\nand Thomas Wolf. Diffusers: State-of-the-art diffusion\\nmodels. https://github.com/huggingface/\\ndiffusers / blob / main / examples / text _ to _\\nimage/train_text_to_image_sdxl.py , 2023. 12\\n[66] Patrick von Platen, Suraj Patil, Anton Lozhkov, Pe-\\ndro Cuenca, Nathan Lambert, Kashif Rasul, MishigDavaadorj, and Thomas Wolf. Stabilityai: Sdxl-base-\\n1.0. https://huggingface.co/stabilityai/\\nstable-diffusion-xl-base-1.0 , 2023. 2, 7, 13,\\n15, 16, 17\\n[67] Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng\\nZhu, Rui Zhao, and Hongsheng Li. Human preference score\\nv2: A solid benchmark for evaluating human preferences of\\ntext-to-image synthesis. arXiv preprint arXiv:2306.09341 ,\\n2023. 2, 7, 13\\n[68] Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hong-\\nsheng Li. Better aligning text-to-image models with human\\npreference. In ICCV , 2023. 2, 7\\n[69] Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gun-\\njan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yin-\\nfei Yang, Burcu Karagol Ayan, et al. Scaling autoregres-\\nsive models for content-rich text-to-image generation. arXiv\\npreprint arXiv:2206.10789 , 2022. 9\\n[70] Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Goj-\\ncic, Or Litany, Sanja Fidler, and Karsten Kreis. Lion: La-\\ntent point diffusion models for 3d shape generation. arXiv\\npreprint arXiv:2210.06978 , 2022. 1\\n[71] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding\\nconditional control to text-to-image diffusion models. In\\nICCV , 2023. 1\\n11', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 10}),\n",
       " Document(page_content='Appendix\\nA. Implementation details\\nA.1. Training\\nWe use the officially released SDXL-Base-1.0 [40] with\\nDiffusers library [64, 65], while only replacing its\\nU-Net with our efficient U-Net. We use the same two\\ntext encoders used in SDXL, which are OpenCLIP ViT-\\nbigG [22] and CLIP ViT-L [43]. For V AE, we use\\nsdxl-vae-fp16-fix [4], which enables us to use FP16\\nprecision for V AE computation. We initialize the weights\\nof our U-Net with the teacher’s U-Net weights at the same\\nblock location. We freeze the text encoders, V AE, and\\nthe teacher U-Net of SDXL and only fine-tune our U-Net.\\nWe train our KOALA models on LAION-Aesthetics V2\\n6+ [57] dataset (about 800M text-image pairs) for 100K\\niterations using four NVIDIA A100 (80GB) GPUs with a\\nresolution of 1024×1024 , a discrete-time diffusion sched-\\nule [20], size- and crop-conditioning as in SDXL [41], a\\nbatch size of 128, AdamW optimizer [35], a constant learn-\\ning rate of 10−5, and FP16 precision. For a fair compar-\\nison to our counterpart BK-SDM [26], we train our effi-\\ncient U-Nets with their distillation method under the same\\ndata setup ( e.g., BK-SDXL-1B and -700M in Tab. 4). Fur-\\nthermore, we also train SDM-Base and SDM-Small in BK-\\nSDM [26] with our approach (Ours-SDM-Base & Ours-\\nSDM-Small in Tab. 4), following the BK-SDM training\\nrecipe on LAION-Aesthetics V2 6.5+ [56]. For the ablation\\nstudies in Tab. 3, we train all models for 30K iterations with\\na batch size of 32 on LAION-Aesthetics V2 6.5+ datasets\\nfor fast verification.\\nA.2. Inference\\nWhen generating samples, we also use FP16-precision and\\nsdxl-vae-fp16-fix [4] for V AE-decoder. Note that\\nin the SDXL original paper [41], authors used DDIM sam-\\npler [61] to generate samples in the figures while the dif-\\nfuser’s official SDXL code [65] used Euler discrete sched-\\nuler [24] as the default scheduler. Therefore, we also use\\nthe Euler discrete scheduler for generating samples. With\\nthe Euler discrete scheduler, we set the denoising step to 50\\nonly for quantitative evaluation in Tab. 3 and Tab. 4, and\\nset it to 25 for other qualitative results or latency measure-\\nments. we set classifier-free guidance [19] to 7.5.\\nA.3. Implementation details of SA-bottom\\nFig. 7 illustrates how to choose transformer blocks when\\ndistilling self-attention (SA) features at DW3 & MID &\\nUP1 as described in Sec. 4.2 (F.3) and Tab. 3c. In Fig. 7,\\nthe Transformer blocks (yellow) with a depth of 10 is from\\nthe original SDXL’s U-Net teacher model, and the Trans-\\nformer blocks (blue) with a depth of 6 is from our KOALA-\\nT0T1T2T3T4T5T6T7T8T9Transformer blocks (depth=10)\\nT0T1T2T3T4T5Transformer blocks (depth=6)Transformer blockSACAFFFigure 7. SA-bottom illustration in Tab. 3c.\\n1B’s U-Net student model. For SA-bottom in Tab. 3c,\\nwe perform feature distillation by selecting consecutive\\nblocks from the teacher model’s transformer blocks, start-\\ning with the first one, and comparing to each transformer’s\\nself-attention (SA) features from the student model’s trans-\\nformer blocks.\\nB. Additional Analysis\\nB.1. Attention visualization for Tab. 3a and Tab. 3b\\nIn Section 4.3 of the main paper, we provide empirical\\nevidence demonstrating the paramount importance of self-\\nattention features in the distillation process. Our findings\\nparticularly highlight the significant impact of specific self-\\nattention ( SA) stages ( e.g.,UP-1&UP-2 ) on enhancing per-\\nformance. To support these results, we extensively ana-\\nlyze self-attention maps in the main paper. To complete\\nthe analysis, we expand our Principal Component Analy-\\nsis [23] (PCA) on self-attention maps to encompass all lay-\\ners in Fig. 9.\\nAs elaborated in the main paper, self-attention begins by\\ncapturing broad contextual information ( e.g.,DW-2&DW-3 )\\nand then progressively attends to localized semantic de-\\ntails ( e.g.,MID). Within the decoder, self-attentions are\\nincreasingly aligned with higher-level semantic elements\\nUP-1&UP-2 ), such as objects, for facilitating a more ac-\\ncurate representation of appearances and structures. No-\\ntably, at this stage, the SA-based model focuses more on\\nspecific object regions than the LF-based model. This leads\\nto a marked improvement in compositional image genera-\\ntion performance.\\n12', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 11}),\n",
       " Document(page_content='0.0 0.2 0.4 0.6 0.8 1.0\\nNormalized Layer Index0.80.850.90.951.0Feature Cosine Similarity\\nDW\\nMID\\nUPFigure 8. Feature cosine similarity analysis. We plot the cross-\\nlayer cosine similarity against the normalized layer indexes of\\ntransformer block.\\nB.2. Feature cosine similarity analysis for Tab. 3c\\nKOALA models compress the computationally inten-\\nsive transformer blocks in the lowest feature levels ( i.e.,\\nDW-3&Mid&UP-1 stages). Specifically, we reduce the\\ndepth of these transformer blocks from 10 to 5 for KOALA-\\n1B and to 6 for KOALA-700M. For this purpose, we\\ndemonstrate that distilling knowledge from the consecutive\\nbottom layers of transformer blocks is a simple yet effective\\nstrategy (see third finding (F3) in the main paper).\\nTo delve deeper into the rationale behind this strategy,\\nwe conducted a thorough feature analysis of the original\\nSDXL model [41]. In particular, we investigate the evo-\\nlution of the features within the transformer blocks. We\\ncompute the cross-layer cosine similarity between the out-\\nput features of each block and those of its predecessors. A\\nlower similarity score indicates a significant contribution of\\nthe current block, whereas a higher score implies a marginal\\ncontribution.\\nFor this analysis, we leverage the diverse domain of\\nprompts in the HPSv2 dataset [67]. We compute the cross-\\nlayer cosine similarity for each stage ( DW&Mid&UP ) and\\naverage these values across all prompts. The results are il-\\nlustrated in Fig. 8. For all stages, transformer blocks ex-\\nhibit a tendency of feature saturations: While early trans-\\nformer blocks generally show a significant contribution,\\nlater blocks have less impact. This motivates us to distill the\\nlearned knowledge of consecutive bottom layers of trans-\\nformer blocks for minimal performance degradation.\\nC. Qualitative results\\nC.1. Comparison to other methods\\nWe show generated samples with various types of prompt\\nstyle ( e.g., portrait photo, 3d art animation, and paintings)comparing with DALLE-2 [44], SDM-v2.01and SDXL2\\nin Figs. 10 to 12. When generating samples, we set the\\nsame random seed for all models except DALLE-2 because\\nfor DALLE-2 API, we cannot set the random seed. Over-\\nall, our method surpasses DALLE-2 and SDM-v2.0 in terms\\nof visual aesthetic quality and demonstrates decent results\\nwhen compared to SDXL.\\nWe also compare our model to Stable diffusion mod-\\nels, SDM-v2.0 and SDXL, with four random seeds\\nin Fig. 13. We follow the official inference setups of each\\nmodel (SDM-v2.0 [49] and SDXL-Base-1.0 [66]) using\\nhuggingface . Specifically, SDM-v2.0 is set to gener-\\nate with DDIM scheduler [61] with 25 steps and SDXL and\\nours are set to use Euler discrete scheduler [24] with 25\\nsteps. And we set all models to use the classifier-free guid-\\nance [19] with 7.5. The results highlight the robustness of\\nthe KOALA model against varying random seed selection.\\nC.2. Comparison to BK-SDM\\nIn addition to the quantitative comparisons in the main pa-\\nper, we also provide a qualitative comparison with BK-\\nSDM [26]. As illustrated in Fig. 14, BK-SDM occasionally\\noverlooks specific attributes or objects mentioned in the text\\nprompt and generates structurally invalid images. On the\\ncontrary, our proposed model consistently generates images\\nwith enhanced adherence to the text, showcasing a superior\\nability to capture the intended details accurately.\\nC.3. Self-Comparison on KOALA models\\nKOALA-1B vs. KOALA-700M To assess the influence\\nof model size on performance, we conducted a compara-\\ntive analysis between KOALA-1B and KOALA-700M. The\\nresults, as illustrated in Fig. 15 and Fig. 16, reveal that\\nKOALA-1B is able to capture finer details and exhibits a\\nmarginally superior adherence to text prompts. However,\\ndespite its smaller size, KOALA-700M still delivers im-\\npressive results, with the added advantage of a significantly\\nfaster inference time.\\nControllability of KOALA The KOALA-700M model ex-\\nhibits not only faithful visual quality but also remark-\\nable controllability in response to the given text prompts.\\nFig. 17 highlights the impressive controllability for chal-\\nlenging cases, including various painter’s styles (see the first\\nrow), diverse color reflections (see the second row), a range\\nof seasons (see the third row), and different times of day\\n(see the fourth row).\\nC.4. Failure cases\\nAs noted in the main paper, the KOALA models have cer-\\ntain limitations despite their great aesthetic quality. To thor-\\n1https://huggingface.co/stabilityai/stable-diffusion-\\n2\\n2https://huggingface.co/stabilityai/stable-diffusion-\\nxl-base-1.0\\n13', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 12}),\n",
       " Document(page_content='“A kitten and a dog sat side by side” LF (BK) SA (Ours)\\n(a) Generated Image (b) PCA AnalysisDW2 DW3 MID UP1 UP2\\n“a big hippopotamus and a small rabbit.” \\nLF (BK) SA (Ours)\\n(a) Generated Image (b) PCA AnalysisDW2 DW3 MID UP1 UP2Figure 9. Extended analysis on self-attention maps of distilled student U-Nets. (a) Generated images of LF- and SA-based distilled\\nmodels, which are BK-SDM [26] and our proposal, respectively. In BK-SDM’s result, a rabbit or dog is depicted like a hippopotamus or\\ncat, repectively ( i.e., appearance leakage). (b) Visualization of PCA analysis results. Note that from the UP-1 stage, the SA-based model\\nattends to the corresponding object ( i.e., rabbit or dog) more discriminatively than the LFmodel, demonstrating that self-attention-based\\nKD allows to generate objects more distinctly.\\noughly understand these constraints, we present additional\\nexamples and categorize them into distinct cases. Fig. 18\\nclearly demonstrates that the KOALA-700M model faces\\nchallenges in complex scenarios, such as complex com-\\npositional prompts with multiple attributes (the first row),\\nrendering legible text (the second row), capturing intricate\\nstructural details (the third row), and accurately depicting\\nhuman hands (the fourth row).\\nD. Downstream task: Dreambooth\\nTo validate the transferability and generation capability of\\nour KOALA model, we apply our KOALA-700M model\\nto a custom text-to-image (T2I) downstream task, Dream-booth [52], which is a popular custom model for person-\\nalized T2I generation. We fine-tune our KOALA-700M\\nmodel on the Dreambooth dataset using resizing 1024, the\\n8-bit Adam optimizer, a constant learning rate of 5e-5, and a\\nbatch size of 4 for 500 iterations without the incorporation\\nof a class-preservation loss. The number of steps for gra-\\ndient accumulation is set to 2. For generating images, we\\nuse DPM-Solver [36] with 25 denoising steps. As shown\\nin Fig. 19, with about 5-6 photographs provided, subject\\ntraining is conducted alongside an identifier token, taking\\napproximately 20 minutes per subject on an NVIDIA RTX\\nA6000 GPU. The results demonstrate that the images are\\ngenerated seamlessly, without any inconsistencies between\\nthe text and the object.\\n14', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 13}),\n",
       " Document(page_content='SDXL-Base-1.0KOALA-700M (ours)SDM-v2.0DALLE-2Close-up face photography of an old man standing in the rainat night, in the street lit by lamps, leica35mm summiluxClose-up facephotography of a girlstanding in the snow at noon, in the street lit by lampsAphoto of an adorable girl, front view, centered, soft lighting, outdoorsPortrait photo of a young asian boy smiling, highly detailed, 8kPortrait photo of a standing girl,photograph, golden hair, depth of field, moody light, golden hour, centered, extremely detailed, realisticFigure 10. Qualitative comparison with DALLE-2, SDM-v2.0, and SDXL in terms of portrait photo . We follow the official inference\\nsetups of each model (SDM-v2.0 [49] and SDXL-Base-1.0 [66]) using huggingface . Specifically, SDM-v2.0 is set to generate with\\nDDIM scheduler [61] with 25 steps and SDXL and ours are set to use Euler discrete scheduler [24] with 25 steps. And we set all models\\nto use the classifier-free guidance [19] with 7.5.\\n15', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 14}),\n",
       " Document(page_content='A tiny cute puppytoy, riding a snowboard, standing character, in the galaxy, 3dblender renderA super cute and adorable puppy wearing a winter hat and scarf, fantasy, cartoon, disney, pixar, animeA 3d art animationof a cute baby raccoon walking on Mars, wearing an astronaut suit, with many stars in the sky\\nKawaii low poly squirrel character, 3d isometric render, white background, ambient occlusion, unity engine\\nSDXL-Base-1.0KOALA-700M (ours)SDM-v2.0DALLE-2\\nA 3d art character of super tiny cute rabbit, the sunflowers garden, standing character, unreal engine, 3drenderFigure 11. Qualitative comparison with DALLE-2, SDM-v2.0, and SDXL in terms of 3D art . We follow the official inference setups\\nof each model (SDM-v2.0 [49] and SDXL-Base-1.0 [66]) using huggingface . Specifically, SDM-v2.0 is set to generate with DDIM\\nscheduler [61] with 25 steps and SDXL and ours are set to use Euler discrete scheduler [24] with 25 steps. And we set all models to use\\nthe classifier-free guidance [19] with 7.5.\\n16', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 15}),\n",
       " Document(page_content='A portrait of Ludwig van Beethovenin the style of Vincent van Gogh’s “Starry Night” painting.Stippling sketchof a catwith its fur, eyes, and features meticulously crafted using thousands of tiny dots. The density of the dots varies, creating depth and shading throughout the artwork.Photo in a square composition of a graffiti-styled caton a brick wall, vibrant colors, bold strokes, and urban aesthetics come together to depict the feline in a dynamic and street art manner, extremely detailed.SDXL-Base-1.0KOALA-700M (ours)SDM-v2.0DALLE-2\\nVector illustrationof living roomin flat style, pastelcolor palette\\nOil paintingof black holeand astronaut\\nFigure 12. Qualitative comparison with DALLE-2, SDM-v2.0, and SDXL in terms of painting . We follow the official inference setups\\nof each model (SDM-v2.0 [49] and SDXL-Base-1.0 [66]) using huggingface . Specifically, SDM-v2.0 is set to generate with DDIM\\nscheduler [61] with 25 steps and SDXL and ours are set to use Euler discrete scheduler [24] with 25 steps. And we set all models to use\\nthe classifier-free guidance [19] with 7.5.\\n17', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 16}),\n",
       " Document(page_content='A cute magical flying dog, fantasy art, golden color, high quality, highly detailed, elegant, sharp focus, concept art, character concepts, digital painting, mystery, adventureSDM-v2.0SDXLKOALA-700M\\nAn illustration of a robotic wolf, wearing sunglasses and hat, cold color, raining, dark, mist, smoke,extremely detailed, photorealisticSDM-v2.0KOALA-700M\\nSDXLFigure 13. Qualitative comparison between SDM-v2.0 vs. SDXL-Base-1.0 vs. KOALA-700M (ours). For each prompt, we use 4\\nrandom seeds to generate images, while all models are generated with the same seed for each image. SDM-v2.0 [49] is set to generate with\\nDDIM scheduler [61] and SDXL and ours are set to use Euler discrete scheduler [24]. All samples are generated with 25 denoising steps\\nand the cfg-scale [19] 7.5.\\n18', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 17}),\n",
       " Document(page_content='The square boxwas next to the circular canisterThe rectangular mirrorwas hung above the blue sinkBK-SDXL-700MKOALA-700MBK-SDXL-700MKOALA-700M\\nA round muffinand a square napkinA red backpackand a blue bookA brown sheep toyand a blue vaseA dogis chasinga balland having funin the parkA boatis sailing on a lakeand birdsare flyingabove\\nA round bageland a square toaster\\nFigure 14. Qualitative comparison between BK-Base-700M vs. KOALA-700M (ours).\\n19', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 18}),\n",
       " Document(page_content='Cute Catin a Variety of Colors, Universe fulfilling the body, fantasy, renaissance aesthetic, Star Trek aesthetic, pastel colors aesthetic, intricate fashion clothing, highly detailed, surrealistic, digital painting, concept art, sharp focus, illustration\\nPencil paintingof young girl\\nRenaissance-styleportrait of an astronautin space, detailed starrybackground, reflective helmet\\nA 3d art character of a tiny cute rabbit, big reflect eyes, wearing a hoodie, in the city, full body shot, 3D, character, 3d rendering, realistic, adorable, physically based renderingVibrant paintingof a flowerpunk owl, dramatic lighting, abstract flowers, hightly detailed digital painting, 8KAbstract watercolor animeart of a magical girl surrounded by flowers, 8k, stunning intricate details, by artgermA blue haired girl, with blowing bubbles, with a sophisticated intellectual style, anime. dark, cold color\\nA vitral window, Art Noveaustyle, with colorful nature motives and a big foxin the middle, roundedupside, photorealistic\\nKOALA-1BKOALA-700MKOALA-1BKOALA-700MFigure 15. Generated samples of KOALA-1B and KOALA-700M .\\n20', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 19}),\n",
       " Document(page_content='A majestic eagleon topof a big treeat twilightWall graffitiart of astronaut holding a super soakerImpressionist oil paintingof a beachat sunsetwith a narrow aspect ratio\\nAn aerial viewof a cityat night, long exposureLong-exposure nightphotography of a starry skyover a mountain range, with light trails, high detailA sketch of a mysterious castlein the style of Gothicart with an aerial viewpointA Giant space battleship, small flying objects, stars, and nebulain the background, inspired by the movie Star Wars\\nA realistic photo of the astronautreading the bookon the mars, under the moon\\nKOALA-1BKOALA-700MKOALA-1BKOALA-700MFigure 16. Generated samples of KOALA-1B and KOALA-700M .\\n21', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 20}),\n",
       " Document(page_content='A portrait painting of a Golden Retriever like Leonard da VinciAportraitpaintingofaGoldenRetrieverlikeClaudMonetA portrait painting of a Golden Retriever like Andy Warhol\\nOil painting influenced by Monet\\'s impressionist style, presenting a sunrise over a harbor. The calm waters are bathed in a golden light from the sun, with distant silhouettes of anchored ships and boats. The sky transitions through soft hues of light pinks, greens, and yellows. The sun\\'s shimmering reflection on the water enhances the depth of the scene. The artwork is characterized by its loose, expressive brush strokes, embodying the serenity of a peaceful morningOil painting influenced by Monet\\'s impressionist style, presenting a sunrise over a harbor. The calm waters are bathed in a golden light from the sun, with distant silhouettes of anchored ships and boats. The sky transitions through soft hues of bright blue, yellows, and reds. The sun\\'s shimmering reflection on the water enhances the depth of the scene. The artwork is characterized by its loose, expressive brush strokes, embodying the serenity of a peaceful morningOil painting influenced by Monet\\'s impressionist style, presenting a sunrise over a harbor. The calm waters are bathed in a golden light from the sun, with distant silhouettes of anchored ships and boats. The sky transitions through soft hues of cool blues, white, and greys. The sun\\'s shimmering reflection on the water enhances the depth of the scene. The artwork is characterized by its loose, expressive brush strokes, embodying the serenity of a peaceful morning\\nA portrait painting of a Golden Retriever like Edvard Munch \"The Scream\"\\nA photo of a young girl in spring, black hair, front view, smiling, highly detailed, professional photographyA photo of a young girl in summer, black hair, front view, smiling, highly detailed, professional photographyA photo of a young girl in fall, black hair, front view, smiling, highly detailed, professional photographyA photo of a young girl in winter, black hair, front view, smiling, highly detailed, professional photography\\nA majestic lion standing in front of El Capitan in Yosemite National Park at morningA majestic lion standing in front of El Capitan in Yosemite National Park at noonA majestic lion standing in front of El Capitan in Yosemite National Park at twilightA majestic lion standing in front of El Capitan in Yosemite National Park at nightOil painting channeling Monet\\'s impressionist technique, presenting a sunrise over a harbor. The serene waters radiate with the sun\\'s golden light, and distant silhouettes of ships and boats are evident. The expansive sky is artfully painted with variations of a single purple shade. The sun\\'s shimmering reflection on the water adds depth and vibrancy to the scene. The artwork is marked by its loose, expressive brush strokes, conveying the tranquility of a peaceful morning\\nFigure 17. Generated samples of KOALA-700M . Our KOALA-700M model demonstrates faithful visual quality and remarkable con-\\ntrollability across various painter’s styles (see the first row), diverse color reflections (see the second row), a range of seasons (see the third\\nrow), and different times of day (see the fourth row), in response to the given text prompts.\\n22', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 21}),\n",
       " Document(page_content='A corgiwearing a red bowtieand a purple party hatA white dog (left) and a black cat (right) are running together with smilingA close-up photo of a young boy with a white hat, blue t-shirt, and denim jacket\\nA neon sign that reads “Backprop”A photo of a black handbagand a green walleton the wooden floor\\nA photo of the Burj Khalifa, extremely detailed, night, best quality, professional photography, light, strong contrast\\nA close-up photo of handfull of maple leaves.\\nA photo of a store with the name \"koala grocery store\"A picture of a sticky note on desk that says\"I can do it.\"A photo of a red sign that says \"No Trespassing\" in the desert.\\nA photo of the Tokyo Towerin the spring\\nA professional photo of a cultural heritage landscape with a traditional Korean house, in the background of a colorful sky, best quality, RAW photo, UHD, DSLR, soft lighting, high quality, film grain, night, light, strong contrast, contrast, glass reflection, high detailed building, building, time-lapse photography, indoor light, reflective, half aerial view, water, bridgeA professional photo of the Colosseumat sunset\\nA photorealistic image of a young girl blowing bubblesin a park, with colorful flowers and a big blue sky in the background. Shot from a close-up angle to capture the sense of playfulness and innocence\\nA photo of an old lady with her hand raisedin greeting.A woman is speaking on the phoneand making planswith a friend.\\nFigure 18. Failure cases of KOALA-700M . KOALA-700M model faces challenges in complex scenarios, such as complex compositional\\nprompts with multiple attributes (1st row), rendering legible text (2nd row), capturing intricate structural details (3rd row), and accurately\\ndepicting human hands (4th row).\\n23', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 22}),\n",
       " Document(page_content='A [V]dogA [V]dog floating on top of waterA [V]dog with a mountainin the background\\nA [V]dog in the jungleA [V]dog with the Eiffel Tower in the background\\nA [V]dogA [V]dog floating on top of waterA [V]dog with a tree and autumnleaves in the background\\nA [V]dog on the beachA [V]dog with the Eiffel Tower in the background\\nSubject InputImage SampleGenerated ImagesFigure 19. Image Generations with Dreambooth+KOALA-700M .\\n24', metadata={'source': 'pdf\\\\2312.04005.pdf', 'page': 23}),\n",
       " Document(page_content='THE SOCIAL IMPACT OF GENERATIVE AI: A NANALYSIS ON\\nCHATGPT\\nA P REPRINT\\nMaria T. Baldassarre\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nmariateresa.baldassarre@uniba.it\\nDanilo Caivano\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndanilo.caivano@uniba.it\\nBerenice Fernández Nieto\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nberenice.fernandeznieto@uniba.it\\nDomenico Gigante\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\ndomenico.gigante1@uniba.it\\nAzzurra Ragone\\nDepartment of Computer Science\\nUniversity of Bari \"A. Moro\"\\nBari, BA 70125\\nazzurra.ragone@uniba.it\\nSeptember, 2023\\nABSTRACT\\nIn recent months, the social impact of Artificial Intelligence (AI) has gained considerable public inter-\\nest, driven by the emergence of Generative AI models, ChatGPT in particular. The rapid development\\nof these models has sparked heated discussions regarding their benefits, limitations, and associated\\nrisks. Generative models hold immense promise across multiple domains, such as healthcare, finance,\\nand education, to cite a few, presenting diverse practical applications. Nevertheless, concerns about\\npotential adverse effects have elicited divergent perspectives, ranging from privacy risks to escalating\\nsocial inequality. This paper adopts a methodology to delve into the societal implications of Genera-\\ntive AI tools, focusing primarily on the case of ChatGPT. It evaluates the potential impact on several\\nsocial sectors and illustrates the findings of a comprehensive literature review of both positive and\\nnegative effects, emerging trends, and areas of opportunity of Generative AI models. This analysis\\naims to facilitate an in-depth discussion by providing insights that can inspire policy, regulation, and\\nresponsible development practices to foster a human-centered AI.\\nKeywords AI Social Impact ·ChatGPT Social Impact ·Human-centered AI ·Perceptions on ChatGPT ·AI Social\\nConcern\\n1 Introduction\\nIn recent months, the social impact of Artificial Intelligence (AI) has been at the forefront of public debate due\\nprimarily to the introduction of new software systems and technologies, specifically ChatGPT. The rapid development\\nof these technologies has, even more, sparked the debate regarding the advantages, limitations, and risks of Artificial\\nIntelligence’s expanding capabilities. From healthcare to cybersecurity, generative models offer a vast array of practical\\nand prosperous future possibilities. However, concerns regarding potential adverse effects present an opposing viewpoint,arXiv:2403.04667v1  [cs.AI]  7 Mar 2024', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 0}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nwith arguments spanning from privacy risks to deepening social inequalities. Models like ChatGPT personalize the\\ndigital version of the Delphic oracle, where people expect to find answers to their current problems by automating\\ntasks, seeking ChatGPT’s opinion on various issues, and even requesting advice. Nonetheless, it is essential to question\\nwhether we are genuinely resolving uncertainties or uncovering new ones regarding the scope, boundaries, and prospects\\nof generative models’ societal impact. Some other analysts have referred to an \"AI arms race\" in which companies\\nworldwide strive to showcase the best technology, innovation prowess, and leadership in the AI market. On the\\nother side of the debate, discussions refer to the rapid development of these models and the effectiveness of existing\\nlegal frameworks in safeguarding against unintended adverse outcomes. Amongst all these debates, unquestionably,\\nGenerative AI is currently undergoing a period of accelerated evolution. This evolution inevitably brings about a social\\nimpact akin to the ones experienced through numerous other technological advancements that transformed our society\\nin the past. To date, significant effects have been observed in service provision, education, and scientific analysis.\\nHowever, more profound and concerning impacts also unfold in domains like democracy, inequality, security, and\\nmilitary technology.\\nConsequently, a comprehensive examination and analysis are required to understand the positive and negative social\\nconsequences, emerging trends, and areas of improvement of generative models. These studies are needed to address\\npotential vulnerabilities and ensure the development of these technologies considers the diverse social contexts and\\nrealities in which they are deployed.\\nBuilding upon the preceding insights, this analysis adopts a comprehensive approach to explore the societal ramifications\\nand future trajectories of Generative AI, with a specific emphasis on ChatGPT. The analysis is organized as follows:\\nSection 2 examines the potential impacts of ChatGPT across diverse social sectors and the evolution of the debate\\nacross various spheres; Section 3 provides a brief overview of the state-of-the-art of Generative AI models as well\\nas a classification of these; Section 4 presents the study design, goals, and Research Questions, plus the search\\nstrategy. Section 5 includes the data analysis and synthesis, drawing conclusions from the literature review, and sharing\\nvisualizations of our findings. Conclusion and future work close the paper.\\n2 BACKGROUND\\nThroughout history, the advancement of technology has consistently brought about significant transformations in social\\ndynamics. Each new technological breakthrough has sparked debates regarding scientific progress’ advantages and\\npotential hazards. Currently, this discourse encompasses various automated tools, data collection and analysis, and the\\ndigitization of services, among other emerging applications, which have become integral parts of modern-day life.\\nThese novel technological applications pervade numerous societal domains, ranging from education to diplomacy,\\nexerting a profound influence and continuously reshaping various social processes. While there is a prevailing and\\noptimistic belief in the positive impact of technology on human progress [McKendrick, 2019], it is also becoming\\nincreasingly apparent that these disruptive forces could potentially engender unforeseen and unintended consequences.\\nIn informatics, sociology, philosophy, and politics, the development of generative models will continue to ignite\\nin-depth discussions on various subjects. These discussions include topics such as regulation, risk mitigation, liability,\\ntransparency, and accountability, as well as the effects on socialization patterns and the trajectory of technological\\ndevelopment itself.\\nIn our case, the significance of examining the social impact of ChatGPT stems from its potential to cause significant\\nsocial transformations, despite ongoing debates regarding the magnitude of these changes. ChatGPT is a powerful\\ngenerative model that may impact power dynamics at multiple scales, from individual interactions to broader social\\nstructures [Farrell et al., 2022]. This dynamic occurs in complex social environments marked by disparities, stereotypes,\\nconflicts, and various political and social organization forms. These diverse social contexts, which surround scientific\\nadvances, trigger unpredictable and immeasurable consequences that fundamentally transform how we interact with\\neach other and the world. When a disruptive force permeates a new environment, it encounters various forms of\\nresistance. It triggers unintended negative consequences, demands protection from potential vulnerabilities, causes\\nuncertainty about whether it can be regulated, and other related factors. This critical-resistant front contrasts with\\na skeptical perception that questions the gravity of this new disruption and the alarmist interpretations surrounding\\nemerging technologies, in this case. A third front embodies an optimistic outlook, emphasizing the manifold benefits\\nacross multiple sectors, fostering enthusiasm for potential advancements and enhancements, and envisioning their\\npotential to address significant challenges. Overall, the interaction between these forms of resistance shapes perceptions\\nof generative models’ societal impact and raises critical questions about their implications for the broader social fabric.\\n2', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 1}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nThese perspectives, as human behavior changes over time, interact and mutually influence and foster one another. In the\\ncase of ChatGPT, we also observe these attitudes, including optimism, pessimism, and skepticism, but the panorama is\\neven more complex as we present below in our literature review.\\nAs evidence of the increasing interest in ChatGPT and AI, Figure 1 depicts the fluctuations in Google search queries for\\n\"ChatGPT\" globally from May 2020 (the release of ChatGPT) until May 2023 (which includes the time frame of our\\nresearch).\\nFigure 1: Search queries evolution on “ChatGPT” from May 2020 to May 2023 via Google Trends ( https://trends.\\ngoogle.it/trends/explore?date=2020-04-28%202023-05-12&q=Chat%20GPT3&hl=en ). The \"note\" in the\\ngraph reflects an improvement in Google’s data collection system implemented on 1 January 22.\\nFig.1, furthermore, depicts a consistent low level of interest until December 2022, when a significant shift occurred,\\ncoinciding with the months following the launch of Open AI’s ChatGPT as a prototype service on 30 November 2022\\n[Gordon, 2022], which attracted global attention. In April and May of 2023, interest peaked.\\nAs we enter the third wave of AI evolution, examining how socialization processes change and assessing scientific\\ninnovations’ potential positive or negative effects on rights and freedoms is critical. Following Pinch and Bijker [Pinch\\nand Bijker, 1984], it is essential to examine the construction of scientific knowledge across different localities and\\ncontexts. As a result, the primary goal of this analysis is to assess the evolution of ChatGPT in recent years and to\\nexamine its current perceived impact on various social aspects, in the context of the ongoing wave of AI evolution.\\n3 STATE OF THE ART\\nGenerative pre-training (GP) was a well-known concept in machine learning applications since 2012 [noa, 2012].\\nLater, in 2017 Google introduced the transformer architecture [Vaswani et al., 2017]. These advancements led to the\\nbirth of large language models like BERT in 2018 [Devlin et al., 2019] and XLNet in 2019 [Yang et al., 2019]: these are\\npre-trained transformers (PT) but are not designed to be generative. A language model is a probability distribution over\\nsequences of words [Jurafsky and Martin, 2008]: given any sequence of words of length m, a language model assigns a\\nprobability to the whole sequence. A Large Language Model (LLM) is a language model based on a neural network\\nwith many parameters (typically billions or more). Prior to transformer-based architectures, the best-performing neural\\nNatural Language Processing (NLP) models employed supervised learning from large amounts of manually-labeled\\ndata.\\nThe main drawbacks of using supervised learning are the impossibility to use it on not well-annotated datasets, and\\nalso the prohibitive cost and time required to train extremely large language models [Radford and Narasimhan, 2018].\\nUsually, LLMs trained on a large quantity of data can perform discretely a good number of tasks; anyway, they can be\\nfine-tuned (i.e., further trained on specific data) to execute a specific task with better performance.\\nLater, in 2018 OpenAI [OpenAI, 2018] published its famous article \" Improving Language Understanding by Generative\\nPre-Training \", in which the first Generative Pre-trained Transformer (GPT) system was introduced [Radford and\\nNarasimhan, 2018]. GPT is a type of large language model (LLM) used mainly for Generative AI, which is a type of AI\\ncapable of generating various kinds of content, such as text and images, in response to instructions (also known as\\nprompts). Generative AI models learn the patterns and structure of their input training data and then generate new data\\nthat has similar characteristics, according to what has been asked as a prompt.\\n3', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 2}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\n3.1 Foundational models\\nA foundational model is an AI model trained on broad data at scale such that it can be adapted to a wide range of\\ndownstream tasks. The most famous and performant GPT foundation models are the ones released by OpenAI. The\\nmost recent is GPT-4 [OpenAI, a], for which OpenAI refused to publish the size and training details due to business\\nreasons [OpenAI, b].\\nOther such models include Google’s PaLM [Google, a] and Together’s GPT-JT [Together], which has been reported as\\nthe closest-performing open-source alternative to GPT-3. Meta as well has released a generative foundational language\\nmodel, named LLaMA [Meta]. Foundational GPTs can also handle media (and not only text), both for input and/or\\noutput. For example, GPT-4 is capable of processing text and images as input but only produces text as output.\\n3.2 Task-Specific Models\\nA foundational GPT model is usually further trained to better perform specific tasks and/or handle subject-matter\\ndomains. One of the most used methods for such adaptation is fine-tuning (beyond that done for the foundation model).\\nFine-tuning is an approach in which the weights of a pre-trained (language) model are trained on new data. One example\\nof this is fine-tuning LLM to comprehend and follow instructions: in January 2022, OpenAI introduced InstructGPT\\n[OpenAI, c] –a series of models which were fine-tuned to follow instructions. The gained advantages included higher\\naccuracy, less negative/toxic sentiment, and generally better alignment with user needs. Other examples of task-specific\\nmodels are chatbots, AI systems that engage in human-like conversation; ChatGPT [OpenAI, d] is currently the most\\nfamous chatbot. Anyway, other major chatbots currently exist, such as Microsoft’s Bing Chat [Bing] – which uses\\nOpenAI’s GPT-4 [OpenAI, a] (as part of a broader close collaboration between OpenAI and Microsoft) –, Google’s\\ncompeting chatbot Bard [Google, b] and LaMDA [Google, c], Jasper Chat [Jasper], Claude [Anthropic, a].\\nFinally, the text-to-model task is becoming quite popular. To date, some famous models are Dall-E 2 [2],Stable\\nDiffusion [Diffusion], PhotoSonic Art Generator [Writesonic] whose task is the production of images based on\\nuser-provided textual prompts. Following the text-to-image models, also the text-to-video task has been addressed\\nwith a lot of models, such as: Runway [Runway], Meta’s Make-A-Video [Make-A-Video], Google’s Imagen Video\\n[Imagen] and Phenaki [Villegas et al.]. All these models can generate video from text and/or text/image prompts.\\n3.3 Domain-specific models\\nGPT systems can be re-trained to address particular fields or domains. Examples of such models (and apps) are\\nBloombergGPT [Bloomberg] for the financial domain, which should provide help with financial news and information,\\nSlackGPT [Slack] to support the Slack instant-messaging service by providing help and guidance with navigating and\\nsummarizing discussions (based on OpenAI’s API), BioGPT [Microsoft] for the biomedical domain, to provide help\\nwith biomedical literature text generation and mining, CoPilot [GitHub] for the IT source code development domain, to\\nprovide auto-completion capabilities for developers.\\nSometimes domain-specificity is realized via software components, specifically named plug-ins or add-ons. For\\nexample, Google Workspace has available add-ons such as GPT for Sheets and Docs – which is reported to aid the use\\nof spreadsheet functionality in Google Sheets.\\n4 STUDY DESIGN\\nTo perform this review, we followed the protocol proposed in [Cartaxo et al., 2018], and we completed the review\\nprocess with the strategies presented in [Kitchenham and Charters, 2007] for performing systematic literature reviews.\\nThe following subsections describe in detail the study design and its execution. The literature review presented in this\\nwork was carried out through the following steps:\\n1.Goal and Research questions : the goal and the correlated research questions were identified to guide the\\nliterature review;\\n2.Search strategy : defining the strategy to collect previous works published in the literature, including research\\ndatabases and query strings;\\n3.Eligibility criteria definition : the criteria used to filter the collected studies have been defined;\\n4.Data extraction : defining how relevant data were extracted to help answer the research questions;\\n5.Data analysis and synthesis : defining how to organize extracted relevant data to answer the research questions.\\n4', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 3}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFig. 2 summarizes the review protocol.\\nFigure 2: Research protocol used in the literature review\\n4.1 Goal and Research Question Definition\\nWe formulated the following research questions to analyze the diverse dimensions of ChatGPT’s impact.\\nBased on this goal, we defined the following research questions:\\n•RQ1 : What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\n•RQ2 : What are the emerging trends perceived in ChatGPT development?\\n•RQ3 : Which areas of improvement can be identified in the development of such technologies?\\n4.2 Search strategy\\nThe research has been split into two distinct parts: part one , focused on grey literature, and part two , focused on white\\nliterature. Including both grey literature and academic contributions allowed us to conduct a deeper exploration of the\\nimpact of ChatGPT in various settings and from various perspectives.\\nIn part one – started on 29th November 2023 – we focused on grey-literature sources, like blog posts and news articles\\nfrom multiple domains – such as business, education, technology, and society – emphasizing ChatGPT. Here our goal\\nwas to feel the sentiment of the media and tech sphere and capture feelings on ChatGPT that cannot emerge from\\nwhite-literature sources. Furthermore, some pilot searches on white-literature sources produced very few results, which\\ndid not allow us to derive reliable and consistent conclusions.\\nThe string used for part one was:\\n(“ChatGPT” AND “social concerns”) (“ChatGPT” AND “social impact”) (“ChatGPT” AND “Human rights”) (“ChatGPT”\\nAND “society*”) (“ChatGPT” AND “education*”) (“ChatGPT” AND “ethics”)\\nThis string was used to perform a keyword-based search on Google search engine1; the search was performed in\\nprivate-browsing mode, after logging out from personal accounts and erasing all web cookies and history [Piasecki\\net al., 2017].\\nIn the end, this search resulted in 1230 literature sources.\\nWhile executing part one of the research, we continued executing periodic pilot searches for white-literature sources.\\nIn late February 2023, we noticed a notable change: the number of scientific articles addressing ChatGPT increased\\nsignificantly, encompassing diverse approaches and perspectives. This may be due to the fact that academic papers need\\nmore time to be reviewed by peers and published (w.r.t. blog posts and news articles).\\nSo, for part two of the research – started on 22nd February 2023 – we decided to use Google Scholar2as white-literature\\nsearch engine; here we searched for scientific articles of various fields – such as business, education, technology, society,\\nhealthcare. The string used for part two of the research was the following:\\n1https://www.google.it/\\n2https://scholar.google.com/\\n5', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 4}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\n(\"Large Language Model\" AND \"Social impact\") (\"Large Language Model\" AND Human Rights\") (\"Large Language\\nModel\" OR \"ChatGPT\" AND \"Ethics\")(\"Large Language Model\" AND \"ChatGPT\" AND \"Ethics\") (\"Large Language\\nModel\" OR \"ChatGPT\" AND \"Social concerns\") (\"Large Language Model” AND \"ChatGPT\" AND \"society*\") (\"Large\\nLanguage Model\" AND \"ChatGPT\" AND \"education*\") (\"Chat GPT\" AND \"social concerns\") (\"ChatGPT\" AND \"social\\nimpact\") (\"ChatGPT\" AND \"Human rights\") (\"ChatGPT\" AND \"society*\") (\"ChatGPT\" AND \"education*\") (\"ChatGPT\"\\nAND \"ethics\")\\nIn the end, this search resulted in 86 new literature sources.\\nAll the documents obtained with this search strategy (both in part one and part two) were surveyed using a 3-stages\\ninformation classification process. In the first stage, only the title and keywords of the collected articles were read. In\\nthe second stage, we analyzed the abstract of each article while in the third stage we read the complete article. All these\\nstages were conducted separately and in blind-view way by two of the authors. In case of a disagreement, a third author\\nmanually verified and took the final decision.\\nAll found publications were subjected to the selection criteria outlined in Sec. 4.3 to determine their relevance for\\ninclusion in the analysis.\\n4.3 Eligibility Criteria Definition and Data Extraction\\nThe selection procedure used for filtering the identified pool of 1316 papers was based on the following criteria:\\n• for every text the content should be mainly related to ChatGPT and LLM,\\n• the content should be written in English.\\nThen, the following criteria helped us to ensure we only included publications providing substantial information to our\\nanalysis, especially on ChatGPT, while we excluded papers with only brief mentions or tangential references:\\n•Forblog posts , we required the author’s name to be consistently provided, and the blogs should be specialized\\nin the relevant subject matter.\\n•Regarding news articles , we preferred those that offered an extensive analysis. We adopted this criterion\\nbecause initial news coverage of ChatGPT tended to be repetitive, often focusing on its capabilities and\\nlimitations and just providing a brief history of the model.\\n•In the case of academic articles , we sought diverse approaches to ensure a comprehensive perspective rather\\nthan lean solely on a single field, such as, for instance, the impact of ChatGPT in education.\\nAfter applying these selection criteria, we selected a total of 71papers from our initial pool of 1316 articles.\\nIn the Data Extraction step, we extracted all relevant data that could help answer any of the research questions. The\\nextraction process was performed by two of the authors and conflicts were solved by a third author in a blind-view way.\\nWe used Atlas.ti3to tabulate and organize data. More detailed information regarding the data and how it was indexed\\ncan be found in the online appendix [Baldassarre et al., 2023].\\n5 DATA ANALYSIS AND SYNTHESIS\\nPart one of the search has been conducted from 29th November 2022 to 22nd February 2023. Part two has been\\nconducted from 22nd February 2023 to 19th May 2023. Table 1 details the results obtained in both parts, as well as the\\ndocuments selected once our selection criteria were applied.\\nResearch phase Resources\\nretrievedResources\\nanalyzedResources\\nselected\\nFirst part, until Feb. 22 on Google 1230 300 25\\nSecond part, until May 19 on Google scholar 86 63 46\\nTable 1: Amount of documents collected grouped by research phase.\\n3https://atlasti.com/\\n6', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 5}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nIn order to answer RQ1, RQ2 and RQ3 we performed an analysis using Atlas.ti. Atlas.ti is a qualitative research tool\\nthat enables the systematic organization of documentary resources by using codes and creating documentary categories.\\nAtlas.ti allowed us to visualize and intuitively present content trends within the analyzed documents. We employed this\\ntool to analyze the selected papers, which were organized in Atlas.ti \"document groups\" following the categories in\\nTable 2.\\nCodes for text analysis in Atlas.ti\\nPositive Impact Negative Impact Emerging trends Areas for improvement\\nBenefits to education Disinformation risk Impact on the tech/AI\\nmarketNeed for appropriate\\nregulation\\nBenefits to customer\\nserviceNegative impact on\\nfreedom of expressionCopyright uncertainty GDPR compliance\\nconcerns\\nBenefits of responses in\\nreal-timeBias concerns Uncertainty over liability\\nfor production failuresUncertainty of\\nclassification under the\\nAI Act\\n... ... ...\\nTable 2: Codes for text analysis in Atlas.ti\\nThe categories and codes presented in Table 2 are derived from an in-depth analysis of the selected papers. The codes\\nwere primarily developed \"in vivo,\" meaning they emerged as potential units of analysis during the reading and analysis\\nprocess. For example, when repetitive references were made to the potential positive impacts of ChatGPT in education,\\nwe created the code \" Benefits for education \". After reviewing and analyzing the documents, the codes were organized\\ninto four categories, which helped to address RQ1, RQ2, and RQ3. The criteria for categorization emerged from\\na thoughtful reflection on the codes and their contextual relevance. In the category of positive impacts, codes such\\nas \"24/7 Availability \" and \" Personalized feedback \" are included since they are frequently mentioned as strengths of\\nChatGPT. On the other hand, the category of negative impacts encompasses codes such as \" Bias concerns \", which\\nemerged as a recurring argument when discussing the potentially detrimental effects of this model. Other codes like\\n\"Privacy concern \" and \" Water footprint \" were identified, further emphasizing the importance of addressing these issues\\nin the context of ChatGPT’s implementation.\\nIn the category of emerging trends , we captured the unforeseen consequences, which, although potentially negative,\\narise unexpectedly from the evolution of the model itself. Examples include \" Copyright uncertainty \" and the \" Need\\nto clarify private sector liability \", which pose challenges that trigger transformations in particular domains such as\\nthe \" Impact on the tech/AI market \". This category also encompasses unexpected challenges to the AI Act [European\\nCommission, 2021] and its coverage of generative models. Another aspect within this category is \" Unintentional\\nmisinformation \", referring to instances where the chat model provides unintentionally inaccurate information, a matter\\ncurrently under scrutiny by various experts. Additionally, the category encompasses codes like \" Skepticism about its\\nactual impact \". These emerging trends shed light on the complex issues ChatGPT presents.\\nThe final category encompasses areas of improvement , focusing on aspects that require further development to address\\nall the adverse and unexpected effects of the model. For instance, the tag \" Negative outcomes mitigation \" highlights\\nthe need for more efforts to minimize adverse consequences. The categorization of codes as \" Uncertainty in data\\ngovernance \" also responds to the criteria as an area of improvement rather than a negative impact. The above-mentioned\\ndecision was made considering that current legal frameworks do not adequately account for models like GPT, thus\\nhighlighting the need for specific measures to address these cases, which will likely be developed in the coming years.\\nThis category also encompasses improvement areas, such as \" Limited up-to-date information \" and \" Limited Medical\\nterminology \", emphasizing the potential for enhancements.\\nAfter establishing the codes and categories, we visually represented their distribution across the 71 documents. The\\nresulting tree map in Atlas.ti, depicted in Figure 3, displays the frequency of the most repeated codes.\\n7', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 6}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFigure 3: Treemap- Atlas.ti with codes distribution\\nTable 3 also shows the frequency of codes, highlighting the least and most recurrent codes within the Atlas.ti analysis.\\nMost recurrent # Least recurrent #\\nBias concerns 39 Unpredictability risk 1\\nDisinformation risk 25 Prone to injection attacks 1\\nBenefits for education 22 Over-regulation risk 1\\nPrivacy concern 21 Opportunity to increase renewable energy use 1\\nNeed for appropriate regulation 20 Need for using Renewable Energy Sources 1\\nDiscrimination risk 18 Need for human rights safeguards 1\\nUnfairness in the data use in the model 18 Need for explainability and traceability 1\\nBenefits for customer service 17 Need for Accessibility and Affordability 1\\nInaccurate answers 17 Limited Medical terminology 1\\nBenefits for content creation 15 Lead people into extremist positions risk 1\\nTable 3: Most and least recurrent codes; each category is associated with its number of occurrences.\\nIn addition, Fig.4 displays a Sankey diagram that graphically depicts the distribution of code categories across the\\nanalyzed documents (in document groups-unit). The diagram shows that scientific papers, blog posts, conference\\nsymposiums, and other types of publications encompass all coding groups (negative and positive impacts, areas of\\nimprovement, and emerging trends). See Table A2 in online appendix [Baldassarre et al., 2023] for a description\\nof document categories. The articles category includes just emerging trends, areas for improvement, and negative\\nperceptions. Most document categories, including scientific papers, columns, analyses, editorials, and articles, exhibit a\\nnegative tendency. Notably, positive perceptions are more prevalent in blog-posts, conference and symposium papers,\\nand to a lesser degree in news articles. \" Bias concern \" and \" Disinformation risk \" are two of the most common codes\\ncontributing to negative perceptions. In contrast, \" Benefits for customer service \" and \" Multidisciplinary benefits \" are the\\nmost prevalent codes in the documents with a positive trend. Table A3 in online appendix [Baldassarre et al., 2023]\\ndetails tendencies and code frequency across all document categories.\\n8', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 7}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFigure 4: Sankey diagram illustrating the distribution of code groups across document groups\\n6 Discussion\\nRQ1. What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\nThroughout our literature review, we identified several positive and negative impacts attributed to ChatGPT. Noteworthy\\nbenefits include: the potential for enhancing customer service; multiple papers emphasize the positive impact of\\nChatGPT in this domain [International Journal of Human Rights Law Review, 2023, Ray, 2023, Paul et al., 2023,\\nHillemann and Zimprich, 2023, Davis, 2022, Lock, 2022, Deo, 2023, Rivas and Zhao, 2023, Kumordzie, 2023, Levente,\\n2023, Marr, 2023, Abdullah et al., 2022, Khowaja et al., 2023, Gupta et al., 2023, Bruff, 2023, Iskender, 2023]. The\\nmodel is highlighted as an enabler of cross-cultural dialogue, facilitating communication between individuals from\\ndifferent cultural backgrounds [International Journal of Human Rights Law Review, 2023]. Moreover, ChatGPT offers\\nthe advantage of automating repetitive tasks , freeing time for more complex and value-added activities [Davis, 2022,\\nLevente, 2023, Khan and Umer, 2023, Gupta et al., 2023]. These benefits extend to various sectors, including business\\nand healthcare [Deo, 2023]. Another key advantage is its availability around the clock . This 24/7 accessibility proves\\nvaluable in commercial, healthcare, and educational contexts [Paul et al., 2023, Levente, 2023, Lee, 2023, Sun and\\nHoelscher, 2023, Cardoso, 2023]. The model’s continuous availability ensures timely assistance and support, covering\\nusers’ diverse needs.\\nChatGPT also demonstrates significant potential in education , offering various advantages [Marr, 2023, Shidiq, 2023,\\nLee and Yilmaz Soylu, 2023, Sun and Hoelscher, 2023, Mhlanga, 2023, Rivas and Zhao, 2023, Abdullah et al., 2022,\\nBahrini et al., 2023, Ray, 2023, Ausat et al., 2023, Solaiman et al., 2019, Lee and Yilmaz Soylu, 2023, Tiunova and\\nMuñoz, 2023, Iskender, 2023, Bruff, 2023, Geertsema et al., 2023]. Despite concerns about plagiarism and academic\\nintegrity, the model’s integration can enhance teaching practices in several ways. It can, for example, automates\\ncurriculum creation, enabling educators to save time and streamline the process [Marr, 2023, Bahrini et al., 2023].\\nMoreover, it facilitates the development of innovative educational content, fostering an engaging learning environment\\n[Lee, 2023, Shidiq, 2023]. Additionally, the model serves as a personalized study support assistant, providing tailored\\nguidance and assistance to individual learners [Ray, 2023, Rivas and Zhao, 2023, Sun and Hoelscher, 2023, Mhlanga,\\n2023, Abdullah et al., 2022, Ausat et al., 2023]. In this perspective, a vision emphasizes the need for controlled\\nintegration and adherence to academic guidelines to ensure responsible and ethical use of generative AI models in\\neducation [Sun and Hoelscher, 2023]. By establishing appropriate regulations and ethical frameworks, the educational\\nbenefits of ChatGPT can be maximized while addressing concerns related to plagiarism and promoting an enriching\\nlearning experience.\\nIn the medical field, the model has shown promise in research, data analysis, and telemedicine applications, contributing\\nto advancements in healthcare [Bahrini et al., 2023]. Furthermore, diverse papers recognize its potential contribution to\\n9', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 8}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\naddressing environmental challenges . For instance, it may help find innovative solutions to reduce water consumption\\nin the AI industry, highlighting its role in promoting sustainability [George et al., 2023]. Similarly, ChatGPT is also\\nappreciated for its potential as an informative and accountability tool within institutions [Ray, 2023, Cardoso, 2023,\\nBiswas, 2023a]. Lastly, it positively impacts journalism, where it can help with content creation, fact-checking, and\\ngenerating engaging narratives [Davis, 2022, Marr, 2023, Bahrini et al., 2023].\\nConversely, the most recurrent social concern is bias. Several papers [International Journal of Human Rights Law\\nReview, 2023, Ray, 2023, Paul et al., 2023, Lee, 2023, Shidiq, 2023, Biddle, 2023, C and J, 2023, Equality Now,\\n2023, Robson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023a, Vidhya et al., 2023, Khan and Umer, 2023, Ausat et al.,\\n2023, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Treude and Hata, 2023, Farrell et al., 2022, Tiunova\\nand Muñoz, 2023, Thirunavukarasu et al., 2023, Stepanechko and Kozub, 2023, Geertsema et al., 2023] discuss the\\nrisk of deepening existing biases and how ChatGPT can include sexist and racist views due to the characteristics of\\nthe data used during its training. In this sense, there is a special concern about using these tools in various sectors,\\nincluding finance [Khan and Umer, 2023] and other social activities, which can replicate and deepen structural and\\nhistorical inequalities. Our analysis also reveals another perceived negative impact, which is its potential to generate\\nfalse information and facilitate the spread of disinformation [Ray, 2023, Paul et al., 2023, Hillemann and Zimprich,\\n2023, Davis, 2022, Lock, 2022, Bahrini et al., 2023, Equality Now, 2023, Robson, 2023, Li, 2023, Biswas, 2023a,\\nVidhya et al., 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Wolf, 2023, Vallance,\\n2022, Rozado, 2023-03, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Farrell et al., 2022, Tiunova and\\nMuñoz, 2023, Khowaja et al., 2023]. This misuse of technology directly affects rights related to access to accurate\\ninformation and freedom of expression, as well as democratic stability. This phenomenon is particularly relevant as,\\nalthough remarkable in artificial intelligence, the advancements and improvements in generative models have raised\\nconcerns and uncertainties regarding their impact on democratic processes . There is a special concern about the\\nease with which false information can be generated and disseminated, especially in critical contexts such as elections,\\nreferendums, political instability, war conflicts, or under dictatorial regimes. Furthermore, this potential negative impact\\nincludes the digital public sphere, where spreading hate speech on social networks [Institute for Human Rights and\\nBusiness, 2023] can lead to social fragmentation and bolster the manipulation of democratic institutions and social\\ncontrol. False information can be weaponized across various domains, from the stock market to information warfare\\nand propaganda.\\nIn the same vein, another relevant concern is privacy [International Journal of Human Rights Law Review, 2023, Ray,\\n2023, Paul et al., 2023, Deo, 2023, Lee, 2023, Bahrini et al., 2023, Mhlanga, 2023, C and J, 2023, Biswas, 2023a,\\nRobson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023b, Vidhya et al., 2023, Khan and Umer, 2023, Helberger and\\nDiakopoulos, 2023, Vallance, 2022, Khlaif, 2023, Paul et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nIskender, 2023]. Different aspects contribute to this concern, including the coverage of these models under AI Act\\nregulations, its extensive use of user data (particularly minors), the potential for surveillance applications, the data\\nvulnerability to cyber attacks targeting ChatGPT, and the privacy implications when integrating the model into various\\ndomains such as education and the military. Privacy is a growing concern as illustrated by the case of the Italian state\\nban. In this case, the Italian authorities asked OpenAI to “expand its privacy policy for users and made it also accessible\\nfrom the sign-up page prior to registration with the service” [Garante per la protezione dei dati personali, 2023] in order\\nto operate in Italy. This case highlights the urgency for ensuring responsible and transparent use of the technology.\\nAnother important negative impact is the risk of job loss , as automation and AI capabilities advance [Davis, 2022, Deo,\\n2023, Rivas and Zhao, 2023, Lee and Yilmaz Soylu, 2023, Biddle, 2023, Robson, 2023, Khan and Umer, 2023, Air\\net al., 2023, Wolf, 2023, Curtis and ChatGPT§, 2023, Gabbiadini et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023].\\nFurthermore, other papers show apprehension about over-regulation [Helberger and Diakopoulos, 2023], highlighting\\nthe needed balance between ensuring ethical and responsible use of AI technologies while avoiding stifling innovation.\\nAdditionally, the model’s own cybersecurity is listed as a negative impact [Levente, 2023, Bahrini et al., 2023, C\\nand J, 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Khowaja et al., 2023].\\nThe comprehensive list of negative perceptions on ChatGPT impacts can be found in Table A1 in online appendix\\n[Baldassarre et al., 2023], providing further insights into the concerns found in the literature.\\nRQ2. What are the emerging trends perceived in ChatGPT development?\\nAn essential part of our review shows emerging trends [Hillemann and Zimprich, 2023, Telegraph, 2023, DiBenedetto,\\n2023, Deo, 2023, Equality Now, 2023, Institute for Human Rights and Business, 2023, Kumordzie, 2023, Vallance,\\n2022, Perrigo, 2023, Al Ashry, 2023, Bareis, 2023, Curtis and ChatGPT§, 2023, Rutinowski et al., 2023, Lee and\\nYilmaz Soylu, 2023, Rozado, 2023-03, George et al., 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas, 2023b,\\nVidhya et al., 2023, Suguri Motoki et al., 2023, Rivas and Zhao, 2023, Khlaif, 2023, Bahrini et al., 2023, Ray, 2023, Khan\\nand Umer, 2023, Ausat et al., 2023, Paul et al., 2023, Tamkin et al., 2021, Solaiman et al., 2019, Lee and Yilmaz Soylu,\\n10', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 9}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\n2023, Geertsema et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023, Bruff, 2023]. Within this category we found uncertainty surrounding copyright , which is a prominent trend,\\nraising doubts about the possibility of profiting from chat-generated content and whether such content is subject to\\ncopyright protection. Resolving these issues requires legislative intervention, leading to a particular and complex debate\\namong authorities regarding the legal implications of AI-generated content [Hillemann and Zimprich, 2023]. There is\\nalso a growing demand for the enhancement of regulatory frameworks to safeguard original content, considering that\\nmodels like ChatGPT have the potential to negatively impact the work of scientists, writers, researchers, and artists\\n[Al Ashry, 2023, Ray, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023]. Notably, analyses such as by Khowaja et al. [Khowaja et al., 2023] raise crucial questions regarding ownership\\nrights over the data used to train the model and the ownership of the model itself. As mentioned earlier, there is a\\ndominant call for establishing ethical use guidelines , both at a general societal level [Gabbiadini et al., 2023] and\\nspecifically within educational and research institutions [Mhlanga, 2023, Khlaif, 2023, Ausat et al., 2023, Solaiman\\net al., 2019, Tamkin et al., 2021, Lee and Yilmaz Soylu, 2023, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023,\\nIskender, 2023, Stepanechko and Kozub, 2023, Bruff, 2023]. These guidelines will be pivotal in ensuring responsible\\nadoption models such as ChatGPT.\\nAnother emerging trend is developing transparency mechanisms [Equality Now, 2023, Lee and Yilmaz Soylu, 2023,\\nLee, 2023, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nStepanechko and Kozub, 2023]. Transparency is considered a vital strategy to address resistance toward adopting\\nthese models in various contexts [Bahrini et al., 2023] and to mitigate potential AI stigmatization. However, it is also\\nacknowledged that transparency poses challenges that need to be overcome [Khowaja et al., 2023].\\nAnother crucial issue to highlight is the accountability for potentially harmful uses of such technology [Deo, 2023,\\nBiswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and Kozub,\\n2023, Helberger and Diakopoulos, 2023], which involves both the end-users and the companies responsible for its\\ndevelopment [Helberger and Diakopoulos, 2023, Deo, 2023, Institute for Human Rights and Business, 2023, George\\net al., 2023, Biswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and\\nKozub, 2023]. This concern extends to applications in sensitive domains like the military [Deo, 2023]. Furthermore,\\nother several notable trends emerge, including the need for timely and appropriate regulation [Gabbiadini et al.,\\n2023, Bruff, 2023], the existence of political bias (29, 40, 42, 62), transformations within the AI market [Deo,\\n2023, Equality Now, 2023, Kumordzie, 2023, Perrigo, 2023, Tamkin et al., 2021, Farrell et al., 2022, Geertsema et al.,\\n2023], and the integration of renewable technologies and environmental awareness within this field [C and J, 2023,\\nInternational Journal of Human Rights Law Review, 2023].\\nRQ3. Which areas of improvement can be identified in the development of such technologies?\\nThe findings concerning areas for improvement within the field of generative AI present a diverse range of perspectives\\n[Hillemann and Zimprich, 2023, Helberger and Diakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023,\\nInstitute for Human Rights and Business, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al., 2023, Kumordzie,\\n2023, Wolf, 2023, Vallance, 2022, Perrigo, 2023, Al Ashry, 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas,\\n2023a, Zhuo et al., 2023, Abdullah et al., 2022, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Solaiman et al., 2019, Tamkin et al., 2021, C and J, 2023, Tiunova and Muñoz,\\n2023, Khowaja et al., 2023, Gabbiadini et al., 2023, Gupta et al., 2023, Iskender, 2023, Stepanechko and Kozub, 2023,\\nBruff, 2023]. One prominent area is the examination of regulations [Hillemann and Zimprich, 2023, Helberger and\\nDiakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023, Institute for Human Rights and Business, 2023,\\nKumordzie, 2023, Wolf, 2023, Al Ashry, 2023, George et al., 2023, Levente, 2023, Lee and Yilmaz Soylu, 2023, Ray,\\n2023, Solaiman et al., 2019, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023, Bruff, 2023],\\nparticularly regarding whether the risk-based approach outlined in the AI Act effectively covers generative models\\n[Helberger and Diakopoulos, 2023, Wolf, 2023]. It is suggested that comprehensive guidelines should encompass the\\nentire spectrum, from its application to the AI Research and Development (R&D) [George et al., 2023]. Furthermore,\\nthere is a growing advocacy for a people-centred vision [Wolf, 2023, Kumordzie, 2023, Ray, 2023] that emphasizes\\nthe importance of human rights and ethical considerations in designing and implementing generative AI systems.\\nSpecifically, Solaiman et al. [Solaiman et al., 2019] examine the need to \"build frameworks for navigating trade-offs\"\\nand develop decision-making frameworks that account for the complexities and potential trade-offs associated with\\ngenerative AI. Similarly, Li [Levente, 2023] highlights the necessity to transform the European regulatory paradigm to\\neffectively address the challenges posed by LLMs.\\nAnother area of opportunity lies in addressing technical limitations within generative AI models [Levente, 2023, Curtis\\nand ChatGPT§, 2023, Sun and Hoelscher, 2023, Abdullah et al., 2022, Bahrini et al., 2023, Cao et al., 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Thirunavukarasu et al., 2023, Gupta\\net al., 2023, Iskender, 2023]. For instance, a significant challenge is the presence of fictional references [Tiunova\\n11', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 10}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nand Muñoz, 2023] within the generated text, which hampers its reliability. Additionally, words repetition further\\naffects the produced text’s overall quality [Wolf, 2023]. Moreover, the phenomenon of inaccurate information, named\\n\"hallucinations \" [Wolf, 2023, Tamkin et al., 2021], and the lack of context understanding [Paul et al., 2023], are also\\nidentified as areas requiring attention and improvement. Furthermore, the literature highlights the need for enhanced\\nrisk mitigation mechanisms [Equality Now, 2023, Biddle, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al.,\\n2023, Vallance, 2022, Perrigo, 2023]. This entails refining processes for filtering potentially harmful responses\\n[Biddle, 2023], improving the quality and reliability of the data used to train the models [Levente, 2023], and\\nincorporating ethical guidelines for its development [Air et al., 2023], among other measures [Perrigo, 2023].\\nData governance is another significant area that requires attention [Deo, 2023, Ray, 2023, Khan and Umer, 2023,\\nTiunova and Muñoz, 2023]; this crucial task involves safeguarding sensitive information against security breaches,\\nunauthorized access, and information theft [Ray, 2023, Khan and Umer, 2023]. In the same vein, establishing clear\\nguidelines regarding the scope and limitations of information exchange with third parties is also paramount\\n[Tiunova and Muñoz, 2023].\\nOther areas of opportunity include the need for up-to-date data [Sun and Hoelscher, 2023, Zhuo et al., 2023, Abdullah\\net al., 2022, Gupta et al., 2023] to ensure the accuracy and relevance of generative AI models. However, this requirement\\npresents a trade-off between incorporating new data to improve performance or addressing data governance issues first.\\nThe literature review also highlights several other areas of opportunity for improvement; these include promoting\\nend-user responsibilities [Li, 2023], advocating for timely regulation [Li, 2023, Tiunova and Muñoz, 2023], and\\nraising awareness of environmental impact [Tiunova and Muñoz, 2023, Geertsema et al., 2023], among other\\nconsiderations.\\n7 CONCLUSION AND FUTURE WORK\\nWhile we are still in the early stages of evaluating the social impact of generative AI models, this systematic literature\\nreview allows us to gain initial insights into the perceptions surrounding their emergence in contemporary society.\\nOur analysis has revealed notable areas of concern, particularly privacy and the potential for bias . As generative\\nmodels continue to be adopted in diverse social contexts, addressing and mitigating issues related to inequality, bias,\\ndiscrimination, and stereotypes becomes urgent.\\nIn light of this, it is essential to note that generative AI reflects the social context in which it was created. As these\\nmodels are trained on data that captures various aspects of our reality, it becomes clear that addressing their flaws and\\nbiases requires a comprehensive understanding of the broader social context within which they operate. Rather than\\nsolely focusing on repairing the model, it is imperative to also engage in a critical examination of the social factors that\\ncontribute to these biases and limitations.\\nSome analyses argue that generative AI models, such as ChatGPT, are not intended to address social inequalities.\\nWhile this may be true, it is also essential that these models do not inadvertently contribute to exacerbating social\\nissues. Acknowledging that scientific breakthroughs do not occur within a social vacuum is critical. Therefore,\\nwe must foster a conscious, responsible, and ethically-driven progression of generative AI. Equally important is to\\nemphasize that generative AI models hold immense potential and offer substantial benefits across various fields and\\nsectors, including education, medicine, marketing, business, research, and science. Their impact extends beyond\\ninnovation and significantly influences the legislative landscape. Consequently, policymakers need to address the\\nnecessity for appropriate regulation that not only addresses significant concerns associated with their use, but also\\nsupports and facilitates the ethical development of generative AI models [Clayton, 2023, Anthropic, b]. Undoubtedly,\\nAI generative models have reshaped our way of being in the world, triggering profound changes in our perception\\nand engagement with it. In our relentless pursuit to emulate human interaction, we have also confronted stereotypes,\\nbiases, and imperfections. Rather than succumbing to discouragement, we should use them as motivation to address\\nthem diligently and strive for continuous improvement. More work is required to develop more robust frameworks\\nand ethical guidelines, not only to improve accuracy and efficiency but also to ensure responsible deployment. As\\npart of future work, we propose evaluating ChatGPT regulation in the US, Europe, and Latin America. This analysis\\nwill examine how current legal tools address generative models’ challenges in particular locations. Additionally, to\\nunderstand ChatGPT users’ professional and social views, a survey has been designed and will be distributed among\\nprofessionals and researchers from diverse universities and research centers worldwide. With these exercises, we want\\nto gain a comprehensive understanding of its adoption, regulatory challenges, user perspectives, and deepening into its\\nsocial impact.\\n12', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 11}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nReferences\\nJoe McKendrick. Learning to trust artificial intelligence: An optimist’s view,\\n2019. URL https://www.forbes.com/sites/joemckendrick/2019/06/09/\\nlearning-to-trust-artificial-intelligence-an-optimists-view/ .\\nHenry Farrell, Abraham Newman, and Jeremy Wallace. Spirals of delusion how al distorts decision-making and\\nmakes dictators more dangerous. 2022. ISSN 0015-7120. URL https://www.foreignaffairs.com/world/\\nspirals-delusion-artificial-intelligence-decision-making .\\nCindy Gordon. Will 2023 be the year that OpenAI’s ChatGPT breaks free?, 2022. URL https://www.forbes.com/\\nsites/cindygordon/2022/12/29/will-2023-be-the-year-that-openais-chatgpt-breaks-free/ .\\nTrevor J. Pinch and Wiebe E. Bijker. The social construction of facts and artefacts: Or how the sociology of\\nscience and the sociology of technology might benefit each other. 14(3):399–441, 1984. ISSN 0306-3127. URL\\nhttps://www.jstor.org/stable/285355 .\\nAcoustic Modeling Using Deep Belief Networks | IEEE Journals & Magazine | IEEE Xplore, 2012. URL https:\\n//ieeexplore.ieee.org/document/5704567 .\\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and\\nIllia Polosukhin. Attention Is All You Need, December 2017. URL http://arxiv.org/abs/1706.03762 .\\narXiv:1706.03762 [cs].\\nJacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Trans-\\nformers for Language Understanding, May 2019. URL http://arxiv.org/abs/1810.04805 . arXiv:1810.04805\\n[cs].\\nZhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. XLNet: Generalized\\nAutoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems ,\\nvolume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/\\n2019/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html .\\nDaniel Jurafsky and James Martin. Speech and Language Processing: An Introduction to Natural Language Processing,\\nComputational Linguistics, and Speech Recognition , volume 2. February 2008.\\nAlec Radford and Karthik Narasimhan. Improving language understanding by generative pre-training. 2018.\\nOpenAI. Openai, 2018. URL https://openai.com/research/language-unsupervised .\\nOpenAI. Gpt-4, a. URL https://openai.com/product/gpt-4 .\\nOpenAI. Openai publishing results refusal, b. URL https://cdn.openai.com/papers/gpt-4.pdf .\\nGoogle. Palm2, a. URL https://ai.google/discover/palm2 .\\nTogether. Gpt-jt. URL https://bit.ly/3TqjFes .\\nMeta. Llama. URL https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ .\\nOpenAI. Instructgpt, c. URL https://openai.com/research/instruction-following .\\nOpenAI. Chatgpt, d. URL https://openai.com/blog/chatgpt .\\nBing. Bing chat. URL https://www.bing.com/new .\\nGoogle. Bard, b. URL https://bard.google.com/ .\\nGoogle. Lamda, c. URL https://blog.google/technology/ai/lamda/ .\\nJasper. Jasper chat. URL https://www.jasper.ai/chat .\\nAnthropic. Claude, a. URL https://www.anthropic.com/index/claudes-constitution .\\nOpenAI DALL-E 2. Dall-e 2. URL https://openai.com/product/dall-e-2 .\\nStable Diffusion. Stable diffusion. URL https://stablediffusionweb.com/ .\\nWritesonic. Photosonic art generator. URL https://writesonic.com/photosonic-ai-art-generator .\\nRunway. Runway. URL https://research.runwayml.com/gen2 .\\nMeta Make-A-Video. Make-a-video. URL https://makeavideo.studio/ .\\nGoogle Imagen. Imagen video. URL https://imagen.research.google/ .\\nRuben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi\\nSaffar, Santiago Castro, Julius Kunze, and Dumitru Erhan. Phenaki. URL https://phenaki.video/ .\\n13', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 12}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nBloomberg. Bloomberggpt. URL https://www.bloomberg.com/company/press/\\nbloomberggpt-50-billion-parameter-llm-tuned-finance/ .\\nSlack. Slackgpt. URL https://slack.com/intl/en-gb/blog/news/introducing-slack-gpt .\\nMicrosoft. Biogpt. URL https://github.com/microsoft/BioGPT .\\nGitHub. Copilot. URL https://github.com/features/copilot .\\nBruno Cartaxo, Gustavo Pinto, and Sergio Soares. The role of rapid reviews in supporting decision-making in software\\nengineering practice. In Proceedings of the 22nd International Conference on Evaluation and Assessment in Software\\nEngineering 2018 , EASE’18, page 24–34, New York, NY , USA, 2018. Association for Computing Machinery. ISBN\\n9781450364034. doi:10.1145/3210459.3210462. URL https://doi.org/10.1145/3210459.3210462 .\\nBarbara Kitchenham and Stuart Charters. Guidelines for performing systematic literature reviews in software engineer-\\ning. 2, 01 2007.\\nJan Piasecki, Marcin Waligora, and Vilius Dranseika. Google search as an additional source in systematic reviews.\\nScience and Engineering Ethics , 24, 12 2017. doi:10.1007/s11948-017-0010-4.\\nMaria Teresa Baldassarre, Danilo Caivano, Berenice Fernàndez Nieto, Domenico Gigante, and Azzurra Ragone.\\nhttps://figshare.com/s/77c3a667671472f8eccc, 2023. URL https://figshare.com/s/77c3a667671472f8eccc .\\nEuropean Commission. Proposal for a regulation of the european parliament and of the council laying down harmonised\\nrules on artificial intelligence (artificial intelligence act) and amending certain union legislative acts, 2021. URL\\nhttps://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206 .\\nInternational Journal of Human Rights Law Review. ChatGPT and human rights: Nav-\\nigating the technological frontier, 2023. URL https://humanrightlawreview.in/\\nchatgpt-and-human-rights-navigating-the-technological-frontier/ .\\nPartha Pratim Ray. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics,\\nlimitations and future scope. 3:121–154, 2023. ISSN 2667-3452. doi:10.1016/j.iotcps.2023.04.003. URL https:\\n//www.sciencedirect.com/science/article/pii/S266734522300024X .\\nJustin Paul, Akiko Ueno, and Charles Dennis. ChatGPT and consumers: Benefits, pitfalls and future research agenda.\\npage ijcs.12928, 2023. ISSN 1470-6423, 1470-6431. doi:10.1111/ijcs.12928. URL https://onlinelibrary.\\nwiley.com/doi/10.1111/ijcs.12928 .\\nDennis Hillemann and Stephan Zimprich. ChatGPT - legal challenges, legal opportunities, 2023. URL https:\\n//www.fieldfisher.com/en/insights/chatgpt-legal-challenges-legal-opportunities .\\nJessica Davis. ChatGPT: Enterprises eye use cases, ethicists remain con-\\ncerned, 2022. URL https://www.informationweek.com/big-data/\\nchat-gpt-enterprises-eye-use-cases-ethicists-remain-concerned .\\nSamantha Lock. What is AI chatbot phenomenon ChatGPT and could it replace humans?\\n2022. ISSN 0261-3077. URL https://www.theguardian.com/technology/2022/dec/05/\\nwhat-is-ai-chatbot-phenomenon-chatgpt-and-could-it-replace-humans .\\nPriyanka Deo. Is ChatGPT dangerous for humanity? 2023. ISSN 0971-8257. URL https://timesofindia.\\nindiatimes.com/business/india-business/is-chatgpt-dangerous-for-humanity/articleshow/\\n98471105.cms .\\nPablo Rivas and Liang Zhao. Marketing with ChatGPT: Navigating the ethical terrain of GPT-based chatbot technology.\\n4(2):375–384, 2023. ISSN 2673-2688. doi:10.3390/ai4020019. URL https://www.mdpi.com/2673-2688/4/2/\\n19.\\nDeladem Kumordzie. All you need to know about ChatGPT & why its\\na threat to google, 2023. URL https://medium.com/@cdkumordzie/\\nall-you-need-to-know-about-chatgpt-why-its-a-threat-to-google-fd2b887c8ff8 .\\nLevente. The pros and cons dark side of using chat GPT for businesses, 2023. URL https://medium.com/\\n@Levente22/the-pros-and-cons-dark-side-of-using-chat-gpt-for-businesses-cf2373119dab .\\nBernard Marr. What does ChatGPT really mean for your job?, 2023. URL https://www.forbes.com/sites/\\nbernardmarr/2023/02/13/what-does-chatgpt-really-mean-for-your-job/ .\\nMalak Abdullah, Alia Madain, and Yaser Jararweh. ChatGPT: Fundamentals, applications and social impacts. In 2022\\nNinth International Conference on Social Networks Analysis, Management and Security (SNAMS) , pages 1–8, 2022.\\ndoi:10.1109/SNAMS58071.2022.10062688. ISSN: 2831-7343.\\n14', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 13}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nSunder Ali Khowaja, Parus Khuwaja, and Kapal Dev. ChatGPT needs SPADE (sustainability, PrivAcy, digital divide,\\nand ethics) evaluation: A review, 2023. URL http://arxiv.org/abs/2305.03123 .\\nBulbul Gupta, Tabish Mufti, Shahab Saquib Sohail, and Dag Øivind Madsen. ChatGPT: A brief narrative review. 2023.\\ndoi:10.20944/preprints202304.0158.v1. URL https://www.preprints.org/manuscript/202304.0158/v1 .\\nDerek Bruff. Teaching in the artificial intelligence age of ChatGPT | teaching + learning lab, 2023. URL https:\\n//tll.mit.edu/teaching-in-the-artificial-intelligence-age-of-chatgpt/ .\\nAli Iskender. Holy or unholy? interview with open AI’s ChatGPT. 34:3414–3414, 2023. ISSN 1314-0817.\\ndoi:10.54055/ejtr.v34i.3169. URL https://ejtr.vumk.eu/index.php/about/article/view/3169 .\\nMuhammad Salar Khan and Hamza Umer. Chatgpt in finance: Addressing ethical challenges, 2023. URL https:\\n//papers.ssrn.com/abstract=4439967 .\\nHyunsu Lee. The rise of ChatGPT: Exploring its potential in medical education. page ase.2270, 2023. ISSN 1935-9772,\\n1935-9780. doi:10.1002/ase.2270. URL https://anatomypubs.onlinelibrary.wiley.com/doi/10.1002/\\nase.2270 .\\nGrace H. Sun and Stephanie H. Hoelscher. The ChatGPT storm and what faculty can do. 48(3):119,\\n2023. ISSN 0363-3624. doi:10.1097/NNE.0000000000001390. URL https://journals.lww.com/\\nnurseeducatoronline/Fulltext/2023/05000/The_ChatGPT_Storm_and_What_Faculty_Can_Do.1.\\naspx?context=FeaturedArticles&collectionId=5 .\\nAndré Guskow Cardoso. Do we need a chat-GPT-gov? the importance of technology for effective access to public\\ninformation., 2023. URL https://papers.ssrn.com/abstract=4365773 .\\nMuhammad Shidiq. The use of artificial intelligence-based chat-gpt and its challenges for the world of education;\\nfrom the viewpoint of the development of creative writing skills. 1(1):353–357, 2023. ISSN 2986-5832. URL\\nhttps://ejournal.unuja.ac.id/index.php/icesh/article/view/5614 .\\nJeonghyun Lee and Meryem Yilmaz Soylu. ChatGPT and assessment in higher education, 2023. URL\\nhttps://c21u.gatech.edu/sites/default/files/publication/2023/03/C21U%20ChatGPT%20White%\\n20Paper_Final.pdf .\\nDavid Mhlanga. Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning, 2023.\\nURL https://papers.ssrn.com/abstract=4354422 .\\nAram Bahrini, Mohammadsadra Khamoshifar, Hossein Abbasimehr, Robert J. Riggs, Maryam Esmaeili, Rastin Mastali\\nMajdabadkohne, and Morteza Pasehvar. ChatGPT: Applications, opportunities, and threats, 2023. URL http:\\n//arxiv.org/abs/2304.09103 .\\nAbu Muna Almaududi Ausat, Berdinata Massang, Mukhtar Efendi, Nofirman Nofirman, and Yasir Riady. Can chat GPT\\nreplace the role of the teacher in the classroom: A fundamental analysis. 5(4):16100–16106, 2023. ISSN 2654-5497.\\ndoi:10.31004/joe.v5i4.2745. URL https://www.jonedu.org/index.php/joe/article/view/2745 .\\nIrene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-V oss, Jeff Wu, Alec Radford, Gretchen\\nKrueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine\\nWang. Release strategies and the social impacts of language models, 2019. URL http://arxiv.org/abs/1908.\\n09203 .\\nAlla Tiunova and Felipe Muñoz. Chatgpt: Using ai in social studies academic research, 2023. URL https://papers.\\nssrn.com/abstract=4451612 .\\nPaul Geertsema, Albert Bifet, and Richard Green. ChatGPT and large language models: What are the implications for\\npolicy makers?, 2023. URL https://papers.ssrn.com/abstract=4424048 .\\nA.Shaji George, A.S.Hovan George, and A.S.Gabrio Martin. The environmental impact of AI: A case study of water\\nconsumption by chat GPT. 2023. doi:10.5281/ZENODO.7855594. URL https://zenodo.org/record/7855594 .\\nSom S Biswas. Role of chat gpt in public health. Annals of Biomedical Engineering , pages 1–2, 2023a.\\nSam Biddle. The internet’s new favorite AI proposes torturing iranians and surveilling mosques, 2023. URL https:\\n//theintercept.com/2022/12/08/openai-chatgpt-ai-bias-ethics/ .\\nDavid C and Paul J. ChatGPT and large language models: what’s the risk?, 2023. URL https://www.ncsc.gov.uk/\\nblog-post/chatgpt-and-large-language-models-whats-the-risk .\\nEquality Now. ChatGPT-4 reinforces sexist stereotypes by stating a girl cannot “handle technicali-\\nties and numbers” in engineering, 2023. URL https://www.equalitynow.org/news_and_insights/\\nchatgpt-4-reinforces-sexist-stereotypes/ .\\n15', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 14}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nKurt Robson. Do AI chatbots like ChatGPT pose a major cybersecurity risk?, 2023. URL https://www.verdict.\\nco.uk/do-ai-chatbots-like-chatgpt-pose-a-major-cybersecurity-risk/ .\\nBilly Perrigo. Exclusive: The $2 per hour workers who made ChatGPT safer, 2023. URL https://time.com/\\n6247678/openai-chatgpt-kenya-workers/ .\\nZihao Li. The dark side of ChatGPT: Legal and ethical challenges from stochastic parrots and hallucination, 2023.\\nURL http://arxiv.org/abs/2304.14347 .\\nN. Gowri Vidhya, D. Devi, Nithya A, and T. Manju. Prognosis of exploration on chat GPT with artificial\\nintelligence ethics. 2(9):60–69, 2023. ISSN 2764-3417. doi:10.14295/bjs.v2i9.372. URL https://www.\\nbrazilianjournalofscience.com.br/revista/article/view/372 .\\nAlex Tamkin, Miles Brundage, Jack Clark, and Deep Ganguli. Understanding the capabilities, limitations, and societal\\nimpact of large language models, 2021. URL http://arxiv.org/abs/2102.02503 .\\nChristoph Treude and Hideaki Hata. She elicits requirements and he tests: Software engineering gender bias in large\\nlanguage models, 2023. URL http://arxiv.org/abs/2303.10131 .\\nArun James Thirunavukarasu, Refaat Hassan, Shathar Mahmood, Rohan Sanghera, Kara Barzangi, Mohanned El\\nMukashfi, and Sachin Shah. Trialling a large language model (ChatGPT) in general practice with the applied\\nknowledge test: Observational study demonstrating opportunities and limitations in primary care. 9(1):e46599, 2023.\\ndoi:10.2196/46599. URL https://mededu.jmir.org/2023/1/e46599 .\\nOksana Stepanechko and Liubov Kozub. English teachers’ concerns about the ethical use of chat GPT by university\\nstudents. (25):297–302, 2023. ISSN 2710-3056. doi:10.36074/grail-of-science.17.03.2023.051. URL https:\\n//archive.journal-grail.science/index.php/2710-3056/article/view/1040 .\\nNatali Helberger and Nicholas Diakopoulos. ChatGPT and the AI act. 12(1), 2023. ISSN 2197-6775.\\ndoi:10.14763/2023.1.1682. URL https://policyreview.info/essay/chatgpt-and-ai-act .\\nChristopher Air, Shanaka Wijetunge, and Alexander Dimitrov. The ethics of AI: The cyber risks\\nposed by chat GPT, 2023. URL https://www.dacbeachcroft.com/en/gb/articles/2023/february/\\nthe-ethics-of-ai-the-cyber-risks-posed-by-chat-gpt .\\nZachary B. Wolf. AI can be racist, sexist and creepy. what should we do about it? | CNN politics, 2023. URL\\nhttps://www.cnn.com/2023/03/18/politics/ai-chatgpt-racist-what-matters/index.html .\\nChris Vallance. ChatGPT: New AI chatbot has everyone talking to it. 2022. URL https://www.bbc.com/news/\\ntechnology-63861322 .\\nDavid Rozado. The political biases of ChatGPT. 12(3):148, 2023-03. ISSN 2076-0760. doi:10.3390/socsci12030148.\\nURL https://www.mdpi.com/2076-0760/12/3/148 .\\nInstitute for Human Rights and Business. We asked ChatGPT about its impact on human rights and business. here’s what\\nit told us, 2023. URL https://www.ihrb.org/focus-areas/information-communication-technology/\\nwe-asked-chatgpt-about-its-impact-on-human-rights-on-business-heres-what-it-told-us .\\nSom Biswas. Prospective role of chat GPT in the military: According to ChatGPT. 2023b. ISSN 2632-3834.\\ndoi:10.32388/8WYYOD. URL https://www.qeios.com/read/8WYYOD .\\nZuheir N. Khlaif. Ethical concerns about using AI-generated text in scientific research, 2023. URL https://papers.\\nssrn.com/abstract=4387984 .\\nGarante per la protezione dei dati personali. ChatGPT: OpenAI riapre la piattaforma in italia garantendo più trasparenza\\ne più diritti a utenti e non utenti europei, 2023. URL https://www.garanteprivacy.it:443/home/docweb/-/\\ndocweb-display/docweb/9881490 .\\nNigel Curtis and ChatGPT§. To chatgpt or not to chatgpt? the impact of artificial intelligence on academic publishing.\\n42(4):275, 2023. ISSN 0891-3668. doi:10.1097/INF.0000000000003852. URL https://journals.lww.com/\\npidj/Citation/2023/04000/To_ChatGPT_or_not_to_ChatGPT__The_Impact_of.1.aspx .\\nAlessandro Gabbiadini, Ognibene Dimitri, Cristina Baldissarri, and Anna Manfredi. Does ChatGPT pose a threat to\\nhuman identity?, 2023. URL https://papers.ssrn.com/abstract=4377900 .\\nTech Telegraph. Microsoft, OpenAI, alphabet and big tech are ignoring the human cost behind the rise of ChatGPT and\\nother AI-powered chatbots. 2023. ISSN 0307-1235. URL https://bit.ly/3uXvdwp .\\nChase DiBenedetto. ChatGPT’s surprisingly human voice came with a human cost, 2023. URL https://mashable.\\ncom/article/chat-gpt-open-ai-workers-exploitation .\\n16', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 15}),\n",
       " Document(page_content='The Social Impact of Generative AI A P REPRINT\\nAyman Al Ashry. Chat GPT and its legal impact on society as a new form of AI - copy-\\nright - united arab emirates, 2023. URL https://www.mondaq.com/copyright/1299882/\\nchat-gpt-and-its-legal-impact-on-society-as-a-new-form-of-ai .\\nJascha Bareis. We are scared of the question chat-GPT cannot answer. because the answer is too obvious., 2023. URL\\nhttps://papers.ssrn.com/abstract=4410324 .\\nJérôme Rutinowski, Sven Franke, Jan Endendyk, Ina Dormuth, and Markus Pauly. The self-perception and political\\nbiases of ChatGPT, 2023. URL http://arxiv.org/abs/2304.07333 .\\nFabio Suguri Motoki, Valdemar Pinho Neto, and Victor Rodrigues. More human than human: Measuring ChatGPT\\npolitical bias. 2023. doi:10.2139/ssrn.4372349. URL https://ueaeprints.uea.ac.uk/id/eprint/91668/ .\\nWeeTech Solution. What is ChatGPT and the benefits of using ChatGPT, 2023. URL https://www.\\nweetechsolution.com/blog/what-is-chat-gpt-and-the-advantages-of-using-chat-gpt .\\nTerry Yue Zhuo, Yujin Huang, Chunyang Chen, and Zhenchang Xing. Exploring AI ethics of ChatGPT: A diagnostic\\nanalysis, 2023. URL http://arxiv.org/abs/2301.12867 .\\nAdam Sobieszek and Tadeusz Price. Playing games with ais: The limits of GPT-3 and similar large language models.\\n32(2):341–364, 2022. ISSN 1572-8641. doi:10.1007/s11023-022-09602-0. URL https://doi.org/10.1007/\\ns11023-022-09602-0 .\\nYihan Cao, Siyu Li, Yixin Liu, Zhiling Yan, Yutong Dai, Philip S. Yu, and Lichao Sun. A comprehen-\\nsive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. 2023.\\ndoi:10.48550/ARXIV .2303.04226. URL https://arxiv.org/abs/2303.04226 .\\nJames Clayton. Sam altman: CEO of OpenAI calls for US to regulate artificial intelligence. 2023. URL https:\\n//www.bbc.com/news/world-us-canada-65616866 .\\nAnthropic. Anthropic raises $450 million in series c funding to scale reliable. . . , b. URL https://www.anthropic.\\ncom/index/anthropic-series-c .\\n17', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 16}),\n",
       " Document(page_content='How Far Are We from Intelligent Visual Deductive\\nReasoning?\\nYizhe Zhang∗, He Bai∗, Ruixiang Zhang∗, Jiatao Gu,\\nShuangfei Zhai ,Josh Susskind ,Navdeep Jaitly\\nApple\\n{yizzhang,hbai7,ruixiangz,jgu32,szhai,jsusskind,ndjaitly }@apple.com\\nAbstract\\nVision-Language Models (VLMs) such as GPT-4V have recently demonstrated\\nincredible strides on diverse vision language tasks. We dig into vision-based de-\\nductive reasoning, a more sophisticated but less explored realm, and find previ-\\nously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage\\nRaven’s Progressive Matrices (RPMs), to assess VLMs’ abilities to perform multi-\\nhop relational and deductive reasoning relying solely on visual clues. We perform\\ncomprehensive evaluations of several popular VLMs employing standard strate-\\ngies such as in-context learning, self-consistency, and Chain-of-thoughts (CoT) on\\nthree diverse datasets, including the Mensa IQ test, IntelligenceTest, and RA VEN.\\nThe results reveal that despite the impressive capabilities of LLMs in text-based\\nreasoning, we are still far from achieving comparable proficiency in visual deduc-\\ntive reasoning. We found that certain standard strategies that are effective when\\napplied to LLMs do not seamlessly translate to the challenges presented by visual\\nreasoning tasks. Moreover, a detailed analysis reveals that VLMs struggle to solve\\nthese tasks mainly because they are unable to perceive and comprehend multiple,\\nconfounding abstract patterns in RPM examples.\\n1 Introduction\\nRecent advancements in Vision-Language Models (VLMs) have showcased the success of models\\nsuch as GPT4-V (OpenAI, 2023) and Gemini (Team et al., 2023) across various vision language\\ntasks. These tasks include captioning, object localization, multimodal world knowledge and com-\\nmonsense, visual question answering (VQA), and vision-based coding (Yang et al., 2023). Previous\\nevaluations of these models have proven that state-of-the-art (SOTA) VLMs are capable of perform-\\ning well in numerous vision-based reasoning and understanding tasks (OpenAI, 2023; Team et al.,\\n2023). Notably, prior works have demonstrated that strong VLMs can accurately extract text from\\nimages, understand and reason with charts and tables, and solve simple visual math problems (Yang\\net al., 2023; Nahida Akter et al., 2023).\\nIn this study, we aim to evaluate the limitations of VLMs on challenging tasks that demand sophisti-\\ncated vision-based deduction abilities, an area that has been relatively unexplored. Specifically, we\\nask the models to complete a set of Raven’s Progressive Matrices (RPMs) problems (Kunda et al.,\\n2013; Zhang et al., 2019), which are frequently used to measure human intelligence, by identifying\\nthe correct pattern to fill in the blank from multiple options. See Figure 1 for illustration. This\\nrequires the models to 1) comprehend each given pattern including the choices, 2) deduce underly-\\ning rules and identify any trend that can explain the evolution of these patterns, and 3) employ the\\nlearned rules to choose the missing pattern from the given options. The model’s capacity to handle\\n∗Equal contribution.\\nPreprint. Under review.arXiv:2403.04732v1  [cs.AI]  7 Mar 2024', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 0}),\n",
       " Document(page_content='Grid:\\nAlternative shapes:\\nABCDEF\\nPrompt:You can see a grid of 9 boxes, one of which is empty (marked as ?). You have to choosewhich of the 6 alternative shapes (A-F) to be placed in the empty box in order to completethe pattern that connects the shapes. Finally, provide your prediction as Answer: “X”From the figure we can see:The grid:    1. Top left box: square with a cross sign.    …    9. Bottom right box: ’?’.The alternative shapes are:A. Triangle with a star .    …    F. Triangle with a cross sign.Analyzing the grid, there seems to be a pattern involving both the shapes and the symbols inside them:    - The symbols inside the shapes alternate between a cross, a star, and then a blank space as you move horizontally across each row.    - The shapes themselves also alternate within each row – this is seen with the square, circle, and triangle repeating in each row in that order.Given the identified patterns, the correct shape should be a triangle with a cross sign. Looking at the alternative shapes, option \"F\" fits the described pattern perfectly. Hence, the answer is: \"F\"PerceptionDeductiveReasoningHypothesisVerification\\nFigure 1: Illustration of the visual deductive reasoning for Raven’s Progressive Matrices. The task\\nrequires intricate coordination among perception, deductive reasoning, and hypothesis verification\\ncapabilities exhibited by Vision-Language Models.\\neach aspect must be effectively coordinated to provide the correct answer. Our findings reveal that\\nalthough some problems may seem intuitive to humans, they might not be as intuitive to VLMs.\\nCompared to standard image reasoning tasks like VQA (Antol et al., 2015), RPMs pose several\\nunique challenges: 1) RPMs require sophisticated deductive capabilities that involve multi-hop com-\\nparative reasoning, such as discrimination, relation, and analogy, while VQA typically requires only\\nfew steps of reasoning, 2) RPMs rely solely on visual clues to generate hypotheses and verify them,\\nwhile VQA often involves using natural language to infer the objective and determine which parts\\nto focus on, 3) RPMs are inherently few-shot (mostly 2-shot) learning tasks. Each RPM problem\\nmay have different underlying rules, which demands strong generalization abilities to solve them.\\nHumans have a remarkable ability to learn from just a few examples, and powerful language models\\nlike LLMs have demonstrated this ability in text-based tasks. However, the ability of strong VLMs\\nto solve few-shot reasoning tasks by relying solely on visual cues has not been well studied.\\nAs an emerging field, it is crucial to establish benchmarks and systematic evaluations in order to\\npush the limits of the visual deductive ability of VLMs. Our contributions include:\\n1. We set up a framework for systematically evaluating VLMs on RPM problems. We evaluated\\nseveral SOTA open-source and closed-source VLMs on three diverse datasets, including\\nthe Mensa IQ test, IntelligenceTest, and RA VEN, providing a comprehensive assessment of\\ntheir performance. The results indicate that although LLMs exhibit impressive capabilities\\nin text-based reasoning, such proficiency has not been achieved in image-based reasoning.\\nThe code and evaluation datasets have been released to facilitate future investigation and\\nimprovement over VLMs.2\\n2. We employed standard inference-time strategies in LLMs such as in-context learning (Brown\\net al., 2020) and self-consistency (Wang et al., 2022) to probe the potential of VLMs. We\\nfound that some standard strategies that are effective in LLMs do not seamlessly translate\\nto the VLMs we used.\\n2Our code is available at https://github.com/apple/ml-rpm-bench\\n2', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 1}),\n",
       " Document(page_content='3. We finely diagnose the performance bottleneck of VLMs by breaking down their capability\\nintoperception ,deductive reasoning , and hypothesis verification . Our analysis reveals that\\nperception is the limiting factor in current VLMs. To scrutinize this specific “blind spot”\\nin strong VLMs such as GPT-4V , we provide a case study highlighting where issues occur.\\n4. We identified and examined several issues associated with the current VLMs in this task. These\\nissues include overconfidence, sensitivity to prompt design and an inability to effectively\\nleverage in-context examples. We ablated the effects of different prompts on the overall\\nperformance of the model and found models can benefit from more structured prompts.\\n2 Related Work\\nGeneral LLM Reasoning benchmarks Many text-based reasoning tasks and benchmarks have\\nbeen introduced to evaluate LLMs in various domains (Huang & Chang, 2022) such as general\\nknowledge (Hendrycks et al., 2020), math reasoning (Cobbe et al., 2021), commonsense reasoning\\n(Geva et al., 2021; Clark et al., 2018), factual reasoning (Laban et al., 2023), and coding (Chen et al.,\\n2021). Some noteworthy examples of these works are BIG-bench (Srivastava et al., 2022), HELM\\n(Liang et al., 2022), SuperGLUE (Sarlin et al., 2020), and LAMA (Petroni et al., 2019).\\nVisual reasoning evaluation Previous work on visual reasoning tasks has primarily focused on\\ntasks such as visual question answering (VQA) Antol et al. (2015) and image captioning. These\\ntasks involve answering questions about images or generating natural language descriptions of visual\\ncontent. Researchers have also examined the ability of models to understand the relational and com-\\npositional aspects of objects in images. Datasets like CLEVR (Johnson et al., 2017) and SHAPES\\n(Andreas et al., 2016) assess visual reasoning abilities such as counting, comparing, logical reason-\\ning, and storing information in memory. As the VLMs abilities to perform visual reasoning have\\nevolved so have the benchmarks. New benchmarks, like MMMU (Yue et al., 2023) and MathVista\\n(Lu et al., 2023) have been developed that test the models’ ability to emulate human-like under-\\nstanding of scenes and objects in images and videos. These benchmarks include areas such as scene\\ntext understanding (Sidorov et al., 2020), formulation (Lu et al., 2024), table and chart interpreta-\\ntion (Lu et al., 2024), the comprehension of visual stimuli (Yang et al., 2023), geometric reasoning\\n(Ahrabian et al., 2024), spatial reasoning (Chen et al., 2024), and facial expression comprehension\\nand reasoning (Yang et al., 2023).\\nDeductive reasoning Deductive reasoning evaluation and benchmarks have been conducted for\\nboth textual and visual domains. Two notable examples are GuessWhat?! (De Vries et al., 2017)\\nand ReferIt (Kazemzadeh et al., 2014), which assess the visual reasoning abilities of the models be-\\ning tested. More recently, LMRL Gym (Abdulhai et al., 2023) and Entity Deduction Arena (Zhang\\net al., 2023) have been introduced as methods to evaluate the ability of LLMs to perform multi-turn\\ndeductive reasoning tasks. Another relevant task is ARC (Acquaviva et al., 2022) which shares simi-\\nlarities with RPMs, as they both require correctly inferring unseen outputs based on given examples.\\nComparing with ARC, RPMs are abstract and requires intricate analogical and relational reasoning.\\n3 Experiment Setting\\n3.1 Dataset\\nIn our paper, we employed three RPMs datasets. The Mensa Test3consists of 35 questions with\\nprogressive levels of difficulty. For the purpose of 1-shot learning, we used the first question as an in-\\ncontext example and reserved the remaining 34 questions for evaluation. The IntelligenceTest (IT)4\\nprovided an IQ test encompassing verbal, pattern recognition, math, and structural components. We\\nspecifically focused on pattern recognition, which solely comprised RPMs problems and included\\n66 examples. Additionally, we incorporated the RA VEN dataset (Zhang et al., 2019) for evaluation.\\nThe RA VEN dataset employs a generative model to create RPMs problems using a hierarchical\\npipeline. The test dataset of RA VEN contains 14,000 examples, covering 7 types of distinct figural\\n3https://www.mensa.org/public/mensa-iq-challenge\\n4https://www.intelligencetest.com/questions\\n3', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 2}),\n",
       " Document(page_content='configurations that incorporate different layouts, shapes, and relational structures. In this work, we\\ngenerate 140 new samples, 20 samples for each figural configuration.\\n3.2 Models\\nWe compared various VLMs that represent the state-of-the-art for both closed-source and open-\\nsource models, including GPT4-V (gpt-4-vision-preview) (OpenAI, 2023), Gemini-pro (Team et al.,\\n2023), Qwen-VL-Max (Bai et al., 2023) and LLaV A-1.5-13B (Liu et al., 2023). We use the default\\nsampling method for each of the tested VLMs in our generation process, show in Appendix A.2.\\n3.3 Prompts\\nWe prompt the model with the instruction followed by the query image. For the Mensa set we used\\nthe following prompt:\\nYou can see a grid of 9 boxed, one of which is empty (marked as ?). You have to choose\\nwhich of the 6 alternative shapes (A-F) should be placed in the empty box in order to\\ncomplete the pattern that connects the shapes. Finally, provide your prediction as Answer:\\n“X”.{query image }\\nAppendix A.1 contains the prompts used for all three datasets for different settings. In Section 5.4,\\nwe explain why we chose the particular order of the prompt and query image order.\\n4 Evaluation Results\\n4.1 Evaluation of VLMs on Visual Deductive Reasoning\\nMensa IntelligenceTest (IT) RA VEN\\nEntropy Accuracy ↑Entropy Accuracy ↑Entropy Accuracy ↑\\nGPT-4V 1.49 0.24±0.05 1.40 0.16±0.04 2.07 0.12±0.04\\nGemini Pro 1.24 0.15±0.04 1.18 0.18±0.03 1.37 0.11±0.04\\nQWen-VL-Max 1.13 0.17±0.01 0.97 0.13±0.02 0.48 0.10±0.03\\nLLaV A-1.5-13B 0.72 0.23±0.01 0.64 0.09±0.01 0.25 0.10±0.03\\nRandom Guess 2.58 0.16 2.58 0.16 3.00 0.12\\nTable 1: Benchmark of VLMs on three different datasets. “Entropy” denotes uncertainty of the\\nprediction, and “Accuracy” indicates the percentage of accurately answered questions.\\nIn Table 1 we show how different VLMs performed on each dataset. For each model and dataset, we\\ncomputed the statistics by averaging them over 10 repetitions. From the table, it is evident that GPT-\\n4 either slightly surpasses or is on par with the other models across all benchmarks. However, the\\naccuracy gap between the models is not substantial in terms of their ability to solve RPM puzzles.\\nIt is interesting to note that the performance of these models is comparable to random guessing (last\\nrow), indicating their limited effectiveness in this area. Converting the accuracy on the questions to\\nhuman ranking scale, we find that the models rank in the 2-8 percentile on the Mensa tests. On the\\nIT dataset humans demonstrate a wide range of success rates per question, spanning from 30% to\\n93.4%, which is much higher than the highest accuracy of a mere 18% observed for Gemini Pro.\\nSimilarly, on the Raven dataset humans attain an impressive success rate of 84.67% (Zhang et al.,\\n2019), starkly outperforming VLMs, which consistently yield results akin to random guessing.\\nUncertainty of the prediction We analyze the entropy of model predictions in order to assess the\\nuncertainty inherent in their predictive distribution. For the choices set C, the Entropy is defined as\\nS=−P\\ni∈Cpilogpi. If the model consistently predicts a single answer, it is has an entropy of\\n0. If it randomly guesses, the entropy reaches the upper bound shown in the Table 1. We see that\\nGPT-4 and Gemini Pro exhibit a greater diversity of answers, which is also reflected in the greater\\n4', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 3}),\n",
       " Document(page_content='diversity in recognizing and attempting to identify various patterns. On the other hand, LLaV A and\\nQWen-VL produce more deterministic predictions, resulting in lower entropy.\\nWe observed that all the VLMs tend to be confident while being wrong. Interestingly, we observed\\nthat even when the entropy was high, models tried to provide a nonsensical rational, instead of\\nacknowledging their inability to perform the task; this was observed to happen more often with\\nmodels that had higher entropy. All the tested models never express any level of uncertainty by\\nusing words like “likely” or “maybe”. This excessive confidence can presumably be attributed to\\nthe model pretraining and instruction finetuning steps, which typically do not involve calibrating the\\nmodel for uncertainty. Instead, the models are encouraged to generate uncertain content, leading to\\nmore errors in aggregating in the generated output.\\n4.2 Do some standard strategies used in LLMs translate effectively to visual deductive\\nreasoning?\\nWe tried two strategies that have been shown to be effective in various tasks related to text LLMs:\\n1)One-shot (1-shot) (Brown et al., 2020) involves providing a single in-context RPM example and\\nits solution to the VLMs. 2) Self-consistency (SC) (Wang et al., 2022) entails sampling multiple\\nresponses and selecting the answer that receives the majority vote. The results are shown in Table 2.\\nMensa IntelligenceTest RA VEN\\nEntropy Accuracy ↑Entropy Accuracy ↑Entropy Accuracy ↑\\nGPT-4V (0-shot) 1.49 0.24±0.05 1.40 0.16±0.04 2.07 0.12±0.04\\nGPT-4V (1-shot) 1.41 0.22±0.06 1.31 0.17±0.04 2.03 0.12±0.04\\nGPT-4V (SC) 0.17 0.31±0.01 0.15 0.19±0.02 0.20 0.10±0.02\\nGemini Pro (0-shot) 1.24 0.15±0.04 1.18 0.18±0.03 1.37 0.11±0.04\\nGemini Pro (1-shot) 0.69 0.17±0.03 0.54 0.19±0.01 1.35 0.10±0.03\\nGemini Pro (SC) 0.03 0.18±0.01 0.03 0.18±0.01 0.08 0.10±0.01\\nTable 2: Expanded benchmark of VLMs on three different datasets, including the 1-shot and SC\\nvariants for both GPT-4 and Gemini models. The prompts are provided in Appendix A.1.\\nVLMs struggle with reading in-context image The performance of the 1-shot evaluation did not\\ndemonstrate a significant improvement compared to the 0-shot evaluation. Specifically, we observed\\nonly a marginal 1% enhancement for the IntelligenceTest dataset, while encountering a decrease of\\n2-4% in accuracy for the Mensa test. Surprisingly, all the tested models, including GPT-4V and\\nGemini, struggle with a high failure rate even when the in-context example is identical to the current\\ntask being solved . This is peculiar because powerful LLMs usually exhibit the ability to analogize\\nand copy the in-context example when provided with the same query. We observed accuracy ranging\\nfrom 10% to 20% for these in-context examples across different datasets, which is comparable to\\nthe accuracy when a different example is used as the in-context example.\\nIn order to make this observation concrete we present an ablation experiment with a specific example\\nwe created manually in the style of Mensa problems, which we call M-easy (See Figure 2a for the\\nproblem and Table 3 for a summary of results). Here the same example is used as the in-context\\nexample, and as the task being solved, the model only needs to be able to draw a comparison between\\nthe in-context example and the query, and copy over the answer from the in-context sample5.\\nIn-context Query Accuracy\\nDesc. + Rat. + Ans. Desc. 100%\\nImg. + Desc. + Rat. + Ans. Img. + Desc. 80%\\nImg. + Desc. + Rat. + Ans. Img. 20%\\nImg. Img. + Desc. 80%\\nImg. Img. 40%\\nTable 3: GPT-4V analogizes better when\\nsolely based on text descriptions. Desc., Rat.,\\nAns. and Img. represents description, ratio-\\nnale, answer and image, respectivelyWe first cast the problem as a text-only problem us-\\ning appropriate descriptions for both the in-context\\nexample and the query (row 1). The model demon-\\nstrates a perfect accuracy of 100% showing that it\\nis easy for it to solve this problem when it is rep-\\nresented as text. Next, we added the image to the\\ntextual description for both the in-context example\\nand the query. The accuracy now decreases to 80%,\\neven though additional visual information has been\\n5The results are based on 10 repetitions\\n5', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 4}),\n",
       " Document(page_content='provided (row 2). Finally, when the text description\\nis removed from the query, the accuracy significantly\\ndrops to 20% (row 3). We hypothesize that the drop in accuracy arises because it is much harder\\nfor the model to compare image tokens than it is to compare textual tokens and also that the model\\nutilizes text more than it does the images.\\n5 What limits the performance of the VLMs?\\nGrid:\\nAlternative shapes:\\nABCDEF\\n\"The grid:\\\\n1. Top left box: square with a cross sign.\\\\n2. Top center box: circle with a star.\\\\n3. Top right box: Empty triangle.\\\\n4. Middle left box: empty square. \\\\n5. Middle center box: circle with a cross sign. \\\\n6. Middle right box: triangle with a star. \\\\n7. Bottom left box: square with a star. \\\\n8. Bottom center box: empty circle. \\\\n9. Bottom right box: \"?\".\\\\nThe alternative shapes are:\\\\nA. Triangle with a star .\\\\nB. Triangle with a plus sign.\\\\nC. Circle with a cross sign.\\\\nD. Circle with a star.\\\\nE. Empty triangle.\\\\nF. Triangle with a cross sign.\"\\n(a)M-Easy RPM Problem\\nGrid:\\nAlternative shapes:ABCDEF\\n\"The grid:\\\\n1. Top left box:  A downward-pointing triangle with three dots in it.\\\\n2. Top center box: A leftward-pointing triangle with two dots in it.\\\\n3. Top right box:  An upward-pointing triangle with one dot in it.\\\\n4. Middle left box: A rightward-pointing triangle with two dots in it.\\\\n5. Middle center box:  A downward-pointing triangle with one dot in it.\\\\n6. Middle right box: A leftward-pointing triangle with three dots in it.\\\\n7. Bottom left box:  An upward-pointing triangle with one dot in it.\\\\n8. Bottom center box: A rightward-pointing triangle with three dots in it.\\\\n9. Bottom right box: \\'?\\'.\\\\nThe alternative shapes are:\\\\nA.  An upward-pointing triangle with two dots in it.\\\\nB.  A downward-pointing triangle with one dot in it.\\\\nC. A leftward-pointing triangle with one dot in it.\\\\nD. A rightward-pointing triangle with two dots in it.\\\\nE. A leftward-pointing triangle with two dots in it.\\\\nF.  A downward-pointing triangle with two dots in it.\" (b)M-Medium RPM Problem\\nGrid:\\nAlternative shapes:ABCDEF\\n\"The grid:\\\\n1. Top left box: white circle, white triangle, black square.\\\\n2. Top center box: white triangle, black circle, white square.\\\\n3. Top right box: black square, black square, white triangle.\\\\n4. Middle left box: black circle, white square, white triangle.\\\\n5. Middle center box: black square, white triangle, black square.\\\\n6. Middle right box: white triangle, black square, white circle.\\\\n7. Bottom left box: white triangle, black square, black square.\\\\n8. Bottom center box: black square, white circle, white triangle.\\\\n9. Bottom right box: \\'?\\'.\\\\nThe alternative shapes are:\\\\nA. white circle, white triangle, black square.\\\\nB. black circle, white square, white triangle.\\\\nC. white circle, white square, black triangle.\\\\nD. white circle, black square, white triangle.\\\\nE. black square, white triangle, white circle.\\\\nF. white square, white triangle, black circle.\" (c)M-Hard RPM Problem\\nFigure 2: Three manually created RPM problems evaluated for text description augmentation, illus-\\ntrating varying levels of difficulty: easy, medium, and hard. The correct answers are “F, F, F”.\\nPosition Description of M-Easy RPM Description of M-Medium RPM Desc. of segmented M-Medium RPM\\nTop left A square with an X inside.Triangle pointing down with three dots\\nforming a vertical line in the center.Inverted triangle with three dots inside.\\nTop center A circle with a star inside.Triangle pointing right with three dots\\nforming a horizontal line along the center.Right-pointing triangle with two dots.\\nTop right An empty triangle.Triangle pointing up with four dots form-\\ning a vertical line in the center.Upright triangle with one dot in the\\ncenter.\\nMiddle left A square with an X inside.Triangle pointing down with two dots\\nforming a horizontal line in the middle.Right-pointing triangle with two dots.\\nMiddle\\ncenterA circle with an X inside.Triangle pointing right with a single dot in\\nthe center.Inverted triangle with one dot in the\\ncenter.\\nMiddle\\nrightA triangle with an X inside.Triangle pointing up with two dots forming\\na vertical line along the center.Left-pointing triangle with three dots.\\nBottom\\nleftA square with a star inside.Triangle pointing down with one dot in the\\ncenter.Upright triangle with one dot in the\\ncenter.\\nBottom\\ncenterA circle.Triangle pointing right with two dots form-\\ning a horizontal line in the middle.Right-pointing triangle with three dots.\\nTable 4: The M-Easy and M-Medium RPMs descriptions from GPT-4V for the patterns can con-\\ntain errors, including hallucinations and Chimera descriptions. When the model is provided with\\nsegmented RPM images (i.e., when patterns are separated into multiple image inputs), it leads to a\\nreduction in the error. Errors are indicated in red.\\nWe investigate why VLMs fail to reach human-level performance in answering even simple ques-\\ntions that are intuitive to humans. For this purpose, as a case study, we manually created three RPMs\\nwith varying degrees of difficulty, as depicted in Figure 2. To conduct a fine-grained analysis and\\ndiagnosis of the VLM’s inability to perform this task of visual deductive reasoning with RPMs, we\\ndecompose the evaluation into three consecutive stages:\\n•Perception : assess if the model can understand and describe the patterns in the RPMs.\\n6', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 5}),\n",
       " Document(page_content='•Deductive reasoning : evaluate if the model can discern and articulate underlying rules.\\n•Hypothesis verification : examine the model’s proficiency in formulating a plausible hypoth-\\nesis for the missing pattern and identifying a matching option among alternatives.\\n5.1 How good is the VLM’s perception on this task?\\nWe first asked the model to describe the RPM figures, to assess if they understood the images that\\nwere provided as part of the problem. Surprisingly, even though VLMs are astoundingly accurate\\nin describing commonplace images, they seemed to be quite unsuccessful at accurately describing\\neven the simpler abstract patterns we gave them. The generated descriptions contained numerous\\nerrors across all the tested models , as exemplified by results from GPT-4V in Table 4.\\nWe identified two major issues for this blind spot of VLMs:\\n1.Compounding error : Models tend to replicate the descriptions of previous patterns, leading\\nto an autoregressive amplification of compounding errors in successive descriptions. This\\nresults in an increasingly erroneous narrative throughout the generation process. For exam-\\nple, in Table 4 (M-Medium), When the model first makes a mistake by including “a vertical\\nline” in the description, the subsequent text follows the same error. We think that the au-\\ntoregressive nature of the VLMs causes it to repeat itself, with the preceding text dictating\\nthe entire follow-up text.\\n2.Confounding error : The similarities between patterns cause confusion, as the model struggles\\nto maintain focus on a single pattern. Consequently, we often observe “Chimera descrip-\\ntions” that erroneously combine elements from multiple patterns. For example, in Table 4\\n(M-Easy, middle right), the description seems to combine elements in two adjacent patterns\\n(middle center, middle right). This could be attributed to the model’s failure to effectively\\nfocus its attention on the corresponding pattern when all the patterns appear similar.\\nThese two issues are prevalent across all the tested methods and dataset. When the patterns contain\\nmultiple elements and are more detailed, these issues become more severe.\\nFigure 3: Accuracy of the origi-\\nnal RPM as input with that of the\\nsegmented RPM as input. Results\\nbased on 10 repetitions.Can decomposing the RPMs into each single pattern from\\nthe grid enhance perception? Presumably, by decompos-\\ning the patterns into individual components, we can eliminate\\ntheconfounding errors. To investigate this, we first segmented\\neach of the three manual examples shown in Figure 2, into\\n9 individual question patterns and 6 candidate patterns. We\\nthen used a new prompt A.1 for GPT-4V to read both the\\nfull image and the segmented patterns to infer the answer. In\\nthis way, we found GPT-4V would describe each pattern more\\naccurately. The descriptions of the M-Medium RPM can be\\nfound in Table 4. We conducted 10 tests for each RPM and\\nreport the accuracy comparison with and without segmenta-\\ntion in Figure 3. We also verify the segmentation impact using\\nthe Raven dataset (140 examples). We got 16.4% accuracy for\\nsegmented RPMs and 11.4% for non-segmented RPMs. The\\nresults demonstrate a significant reduction in confounding er-\\nrors, confirming the issues discussed earlier.\\nHallucination We have also observed the model generating\\nhallucinations in the descriptions, particularly when it comes\\nto counting. For instance, in Table 4 (M-Medium, top right),\\nthe model erroneously states that there are four dots when, in reality, there are only three.\\nData distribution aspect VLMs are presumably trained primarily on real world images, which\\nmay cause them to be less sensitive to abstract patterns. We did not try to finetune the model to test\\nthis hypothesis because of the limited availability of diverse RPM data. We believe that additional\\nfinetuning could potentially improve the performance. However, we hypothesize that finetuning the\\n7', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 6}),\n",
       " Document(page_content='M-Easy\\nModel Acc. Ent. A B C D E F\\nGPT-4V 1-shot 50% 1.69 0 0 1 1 3 5\\nGPT-4V 1-shot + Gen. Desc. (CoT) 50% 1.36 0 1 0 0 4 5\\nGPT-4V 1-shot + Oracle Desc. 60% 1.57 0 1 0 1 2 6\\nGPT-4V 1-shot + Oracle Desc. - Visual 60% 0.97 0 4 0 0 0 6\\nGPT-4V 1-shot + Oracle Desc. + Rationale 60% 1.57 1 0 0 1 2 6\\nM-Medium\\nModel Acc. Ent. A B C D E F\\nGPT-4V 1-shot 20% 2.25 0 3 2 2 1 2\\nGPT-4V 1-shot + Gen. Desc. (CoT) 50% 1.69 0 3 1 1 0 5\\nGPT-4V 1-shot + Oracle Desc. 80% 0.92 1 1 0 0 0 8\\nGPT-4V 1-shot + Oracle Desc. - Visual 60% 1.57 1 0 0 1 2 6\\nGPT-4V 1-shot + Oracle Desc. + Rationale 70% 0.88 3 0 0 0 0 7\\nM-Hard\\nModel Acc. Ent. A B C D E F\\nGPT-4V 1-shot 20% 1.49 0 0 3 0 5 2\\nGPT-4V 1-shot + Gen. Desc. (CoT) 30% 1.97 0 2 2 0 3 3\\nGPT-4V 1-shot + Oracle Desc. 40% 1.85 0 3 0 1 2 4\\nGPT-4V 1-shot + Oracle Desc. - Visual 10% 1.96 2 0 1 5 1 1\\nGPT-4V 1-shot + Oracle Desc. + Rationale 50% 1.49 0 2 0 0 3 5\\nTable 5: Accuracy and breakdown of GPT-4V variants with augmented text description across dif-\\nferent RPMs. Each combination is ran for 10 repetitions. The correct answer “F” is marked in color.\\nmodel with RPMs might not entirely eliminate the compounding andconfounding errors , as they\\nappear to be inherent limitations of the VLMs from training.\\n5.2 How good is the VLM’s deductive reasoning on this task?\\nNext, we assess the model’s ability to perform effective reasoning by conditioning it on the ground\\ntruth text description of the RPMs. We provide the prompts in Appendix A.1.\\nDoes the oracle text description improve the model’s performance? The original evaluation\\n(Tables 1 and 2) requires the model to directly generate the answer, making it difficult to disentangle\\nthe understanding and deductive reasoning aspects. To examine the VLMs more closely, we pro-\\nvided each evaluated model with oracle text descriptions that were manually created by the authors.\\nWe then evaluated the models’ performance on the three RPM problems and present the results in\\nTable 5 (GPT-4V + Oracle Desc.). The oracle text descriptions can be found in the Appendix A.3.\\nWe also provide sampled rationale generated by GPT-4V in the Appendix A.4.\\nIt is evident that the model’s performance has been significantly improved with the addition of oracle\\ndescriptions for each pattern (Table 5). The models are able to analyze the given patterns and deduce\\nrules for the M-Easy andM-Medium RPMs, and provide rationale for the problem. For the M-Hard\\nRPM, the models demonstrate some capability of reasoning, albeit with some challenges and is\\nfar from human parity. We provide additional examples in the Appendix. However, it is not clear\\nwhether the models still rely heavily on visual cues or if their reasoning is purely text-based.\\nWill removing the visual cues harm the model? Next, we examine whether textual information\\nalone is sufficient by removing the visual information. The results, shown in Table 5 (GPT-4V\\n+ Oracle Desc. - Visual), are intriguing. Without visual information, the models can maintain a\\nsimilar level of performance for M-Easy andM-Medium RPMs. Notably the result solely rely on\\nthe textual information of the input is superior to the GPT-4V baseline, which mostly rely on visual\\ninformation of the input. However, as the tasks become more challenging ( M-Hard RPM), the\\nmodels start to struggle. The performance is also worse than GPT-4V baseline. This suggests that\\nfor tasks that involve complex spatial layouts and relational reasoning, text alone may be insufficient\\nand potentially confusing, while visual cues may provide additional visual alignment and better\\ncomparative attention. In such cases, visual information and textual clues would complement each\\nother and work in synergy to achieve the optimal performance. Interestingly, when we provide\\nGPT-4V with an incorrect description, there is around an 80% chance that the model recognizes the\\nmismatch between the text and the image and responds with statements such as: “I believe there\\nhas been a misinterpretation of the provided image”. The model, nevertheless, still generates some\\nrationale which seems adhere more closely to the text description than to the visual cues.\\n8', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 7}),\n",
       " Document(page_content='Can the performance be improved by reasoning with noisy text descriptions generated by\\nthe model itself? Drawing inspiration from Chain-of-Thoughts (CoT) in the text domain (Wei\\net al., 2022) and the recent Self-Imagine work (Akter et al., 2024), we further investigate whether\\nVLMs can enhance their performance using noisy text descriptions that they generate on their own.\\nThis also helps us understand the extent to which VLM reasoning relies on accurate descriptions\\nof images and the extent to which it can recover from errors in the descriptions. Table 5 (GPT-4V\\n+ Gen Desc.) shows that incorrect text descriptions can still produce a gain. The gap between\\nself-generated descriptions and oracle descriptions, however, varies across the different cases.\\n5.3 How good is the VLM’s hypothesis verification on this task?\\nFinally, We tested the performance of GPT-4V when it received both an oracle description and\\nan oracle rationale. The oracle rationale, which can be found in Appendix A.1, only includes the\\nexplanation of the underlying rule without predicting the final pattern or answer. The results for 10\\nrepetitions on manual examples are shown in Table 5 (GPT-4V + Oracle Desc. + Rationale).\\nSurprisingly, compared to the row representing GPT-4V + Oracle Desc., the oracle rationale did not\\nsignificantly improve accuracy. In cases where the model failed, it sometimes directly generated\\nan incorrect answer and at other times extended the rationale but still generated false answers. For\\nexample, for M-easy, GPT-4V continued to generate “the third row should have a star, as the first\\ntwo boxes of the third row (square and circle) already have a star.” This indicates that hypothesis\\ngeneration and verification are closely tied to deductive reasoning, and the model has not yet reached\\nhuman-level performance in following hints and turning learned rules into future predictions.\\nInterestingly, strong models like GPT-4V exhibit some strategies similar to humans. For instance,\\nthey often use the answer options along with the grid to form and tests hypotheses, rather than gen-\\nerating a hypothesis solely based on the grid and then checking for any matches with the alternative\\nshapes.6GPT-4V also sometimes employs a strategy of elimination to rule out incorrect answers\\n(e.g., “the right shape should have a cross sign, which leaves the options to C and F.”).\\n5.4 How does the prompt format influence the model prediction?\\nPrompting Structure Mensa\\nGemini Pro Image First 2.3±1.3\\nGemini Pro Instruction First 5.4±1.2\\nGPT4V 1-Shot w/o Sentinel Token 6.1±1.5\\nGPT4V 1-Shot w/ Sentinel Token 7.8±1.7\\nTable 6: Average number of cor-\\nrect predictions made by GPT4-V\\nand Gemini Pro on the Mensa test,\\ndemonstrating its sensitivity to the\\nstructure of prompts used.The format of the prompt can sometimes significantly im-\\npact the performance of VLM. For example, we found the\\narrangement of task instruction and images is crucial to\\nGemini Pro. We show the results in Table 6. We observed\\na remarkable 200% increase in prediction accuracy when\\nwe simply altered the sequence of these elements. How-\\never, we don’t observe similar conclusion from other tested\\nmodels.\\nWe also delves into the differences in how the model\\nperforms under 0-shot and 1-shot evaluation setups. We\\ndiscovered that using special sentinel tokens, such as\\n[{BEGIN/END }OFEXAMPLE] , to clearly separate text prompts from images helps the model delin-\\neate task instructions from in-context examples. This method of structuring prompts is particularly\\neffective in aiding the model’s comprehension across all tested VLMs. For instance, we show the\\nresults of GPT-4V in Table 6.\\nThis study underscores that VLMs, unlike their text-only counterparts, can benefit significantly\\nfrom a more structured format in their task prompts. Furthermore, the interaction between different\\nmodalities, such as text and image, needs to be carefully considered and evaluated.\\n6 Conclusion\\nThis work is a systematic evaluation of the performance of popular Vision-Language Models\\n(VLMs) in a variety of Raven’s Progressive Matrices (RPMs). These tasks serve as a challeng-\\n6This generate-then-verify strategy accounts for less than 10% of GPT-4V’s behavior in our observation.\\nIn such cases the model often rejects the options provided and responds as follows: “Unfortunately, the given\\noptions do not correspond with the identified pattern.”\\n9', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 8}),\n",
       " Document(page_content='ing benchmark for assessing the models’ ability to reason based on visual clues. We observed that\\nthe current state-of-the-art VLMs still fall short of achieving human-level performance on these\\ntasks, with the best-performing models being close-sourced. Our analysis of the models’ perfor-\\nmance reveals that perceptual understanding may be the main bottleneck, as the models perform\\nbetter when provided with appropriate textual descriptions. In future work, it would be intriguing\\nto validate our hypothesis concerning the blind spot of VLMs when it comes to describing patterns.\\nThis investigation has the potential to enhance the general recognition and attentiveness capabilities\\nof VLMs. Additionally, exploring the development of contrastive learning or reinforcement learning\\nalgorithms could further improve the model’s visual deductive reasoning abilities.\\n7 Acknowledgement\\nWe thank Zijin Gu, Yusu Qian, Russ Webb, Yinfei Yang, Zhe Gan for their exceptional contribution\\nto our work.\\n10', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 9}),\n",
       " Document(page_content='References\\nMarwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu,\\nand Sergey Levine. Lmrl gym: Benchmarks for multi-turn reinforcement learning with language\\nmodels. arXiv preprint arXiv:2311.18232 , 2023.\\nSam Acquaviva, Yewen Pu, Marta Kryven, Theodoros Sechopoulos, Catherine Wong, Gabrielle\\nEcanow, Maxwell Nye, Michael Tessler, and Josh Tenenbaum. Communicating natural programs\\nto humans and machines. Advances in Neural Information Processing Systems , 35:3731–3743,\\n2022.\\nKian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, and Jay\\nPujara. The curious case of nonverbal abstract reasoning with multi-modal large language models.\\narXiv preprint arXiv:2401.12117 , 2024.\\nSyeda Nahida Akter, Aman Madaan, Sangwu Lee, Yiming Yang, and Eric Nyberg. Self-imagine:\\nEffective unimodal reasoning with multimodal models using self-imagination. arXiv preprint\\narXiv:2401.08025 , 2024.\\nJacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In\\nCVPR , June 2016.\\nStanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zit-\\nnick, and Devi Parikh. Vqa: Visual question answering. In Proceedings of the IEEE international\\nconference on computer vision , pp. 2425–2433, 2015.\\nJinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang\\nZhou, and Jingren Zhou. Qwen-vl: A frontier large vision-language model with versatile abilities.\\narXiv preprint arXiv:2308.12966 , 2023.\\nTom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal,\\nArvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are\\nfew-shot learners. Advances in neural information processing systems , 33:1877–1901, 2020.\\nBoyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh,\\nLeonidas Guibas, and Fei Xia. Spatialvlm: Endowing vision-language models with spatial rea-\\nsoning capabilities. arXiv preprint arXiv:2401.12168 , 2024. URL https://arxiv.org/abs/\\n2401.12168 .\\nMark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared\\nKaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large\\nlanguage models trained on code. arXiv preprint arXiv:2107.03374 , 2021.\\nPeter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and\\nOyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge.\\narXiv preprint arXiv:1803.05457 , 2018.\\nKarl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser,\\nMatthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to\\nsolve math word problems. arXiv preprint arXiv:2110.14168 , 2021.\\nHarm De Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, and Aaron\\nCourville. Guesswhat?! visual object discovery through multi-modal dialogue. In Proceedings of\\nthe IEEE Conference on Computer Vision and Pattern Recognition , pp. 5503–5512, 2017.\\nMor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle\\nuse a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of\\nthe Association for Computational Linguistics , 9:346–361, 2021.\\nDan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and\\nJacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint\\narXiv:2009.03300 , 2020.\\n11', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 10}),\n",
       " Document(page_content='Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey.\\narXiv preprint arXiv:2212.10403 , 2022.\\nJustin Johnson, Bharath Hariharan, Laurens Van Der Maaten, Li Fei-Fei, C Lawrence Zitnick, and\\nRoss Girshick. Clevr: A diagnostic dataset for compositional language and elementary visual\\nreasoning. In CVPR , pp. 2901–2910, 2017.\\nSahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. ReferItGame: Referring\\nto objects in photographs of natural scenes. In Proceedings of the 2014 Conference on Em-\\npirical Methods in Natural Language Processing (EMNLP) , pp. 787–798, Doha, Qatar, Oc-\\ntober 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1086. URL\\nhttps://aclanthology.org/D14-1086 .\\nMaithilee Kunda, Keith McGreggor, and Ashok K Goel. A computational model for solving prob-\\nlems from the raven’s progressive matrices intelligence test using iconic visual representations.\\nCognitive Systems Research , 22:47–66, 2013.\\nPhilippe Laban, Wojciech Kry ´sci´nski, Divyansh Agarwal, Alexander R Fabbri, Caiming Xiong,\\nShafiq Joty, and Chien-Sheng Wu. Llms as factual reasoners: Insights from existing benchmarks\\nand beyond. arXiv preprint arXiv:2305.14540 , 2023.\\nPercy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian\\nZhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language\\nmodels. arXiv preprint arXiv:2211.09110 , 2022.\\nHaotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. arXiv\\npreprint arXiv:2304.08485 , 2023.\\nPan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-\\nWei Chang, Michel Galley, and Jianfeng Gao. Mathvista: Evaluating mathematical reasoning of\\nfoundation models in visual contexts. arXiv preprint arXiv:2310.02255 , 2023.\\nPan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-\\nWei Chang, Michel Galley, and Jianfeng Gao. Mathvista: Evaluating mathematical reasoning of\\nfoundation models in visual contexts. In ICLR , 2024.\\nSyeda Nahida Akter, Zichun Yu, Aashiq Muhamed, Tianyue Ou, Alex B ¨auerle, ´Angel Alexander\\nCabrera, Krish Dholakia, Chenyan Xiong, and Graham Neubig. An in-depth look at gemini’s\\nlanguage abilities. arXiv e-prints , pp. arXiv–2312, 2023.\\nOpenAI. Gpt-4 technical report, 2023.\\nFabio Petroni, Tim Rockt ¨aschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller,\\nand Sebastian Riedel. Language models as knowledge bases? In EMNLP , 2019.\\nPaul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue:\\nLearning feature matching with graph neural networks. In Proceedings of the IEEE/CVF confer-\\nence on computer vision and pattern recognition , pp. 4938–4947, 2020.\\nOleksii Sidorov, Ronghang Hu, Marcus Rohrbach, and Amanpreet Singh. Textcaps: a dataset for\\nimage captioning with reading comprehension. In ECCV , pp. 742–758. Springer, 2020.\\nAarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam\\nFisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adri `a Garriga-Alonso, et al. Beyond the\\nimitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint\\narXiv:2206.04615 , 2022.\\nGemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu,\\nRadu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. Gemini: a family of highly\\ncapable multimodal models. arXiv preprint arXiv:2312.11805 , 2023.\\nXuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdh-\\nery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.\\narXiv preprint arXiv:2203.11171 , 2022.\\n12', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 11}),\n",
       " Document(page_content='Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny\\nZhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in\\nNeural Information Processing Systems , 35:24824–24837, 2022.\\nZhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan\\nWang. the Dawn of Lmms: Preliminary Explorations With Gpt-4v(ision), 2023.\\nXiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens,\\nDongfu Jiang, Weiming Ren, Yuxuan Sun, et al. Mmmu: A massive multi-discipline multimodal\\nunderstanding and reasoning benchmark for expert agi. arXiv preprint arXiv:2311.16502 , 2023.\\nChi Zhang, Feng Gao, Baoxiong Jia, Yixin Zhu, and Song-Chun Zhu. Raven: A dataset for relational\\nand analogical visual reasoning. In CVPR , pp. 5317–5327, 2019.\\nYizhe Zhang, Jiarui Lu, and Navdeep Jaitly. the Entity-deduction Arena: A Playground for\\nProbing the Conversational Reasoning and Planning Capabilities of Llms. arXiv preprint\\narXiv:2310.01468 , 2023.\\n13', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 12}),\n",
       " Document(page_content='A Appendix\\nA.1 Prompts for different dataset and tasks\\nWe listed all the prompts we used to generate the response from different models.\\nMensa. (0-shot)\\nYou can see a 3x3 grid of 9 boxes, one of which is empty (marked as ?). You have to choose\\nwhich of the 6 alternative shapes (A-F) should be placed in the empty box in order to\\ncomplete the pattern that connects the shapes. Finally, provide your prediction as Answer:\\n“X”\\n{query image }\\nMensa (1-shot)\\nYou can see a 3x3 grid of 9 boxes, one of which is empty (marked as ?). You have to choose\\nwhich of the 6 alternative shapes (A-F) to be placed in the empty box in order to complete\\nthe pattern that connects the shapes. Think step by step by first describe the each box in the\\n3x3 grid, and each of the alternative shapes as the candidate answers. Then identify the\\ncommon pattern. Finally, provide your prediction as Answer: ”X”\\nFor example, for the following image:\\n{in-context image }\\n{in-context description }\\n{in-context rationale }\\n{in-context answer }\\nNow do the following one:\\n{query image }\\nMensa (1-shot) + Oracle Desc.\\nYou can see a 3x3 grid of 9 boxes, one of which is empty (marked as ?). You have to choose\\nwhich of the 6 alternative shapes (A-F) to be placed in the empty box in order to complete\\nthe pattern that connects the shapes. Think step by step by first describe the each box in\\nthe 3x3 grid, and each of the alternative shapes as the candidate answers. Then identify\\nthe common pattern. Finally, provide your prediction as Answer: ”X”\\nFor example, for the following image:\\n{in-context image }\\n{in-context description }\\n{in-context rationale }\\n{in-context answer }\\nNow do the following one:\\n{query image }\\n{query oracle description }\\n14', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 13}),\n",
       " Document(page_content='Mensa (1-shot) + Oracle Desc. + Oracle Rationale\\nYou can see a 3x3 grid of 9 boxes, one of which is empty (marked as ?). You have to choose\\nwhich of the 6 alternative shapes (A-F) to be placed in the empty box in order to complete\\nthe pattern that connects the shapes. Think step by step by first describe the each box in\\nthe 3x3 grid, and each of the alternative shapes as the candidate answers. Then identify\\nthe common pattern. Finally, provide your prediction as Answer: ”X”\\nFor example, for the following image:\\n{in-context image }\\n{in-context description }\\n{in-context rationale }\\n{in-context answer }\\nNow do the following one:\\n{query image }\\n{query oracle description }\\n{query oracle rationale }\\nIntelligenceTest (0-shot)\\nThis image is an Intelligence Test Question asking which figure (A-F) belongs in the bottom\\nright box. Please select the correct answer. You must first give your explanation and then\\noutput the answer at the end of your response in the format: “The correct answer is: ”.\\n{query image }\\nRA VEN (0-shot)\\nYou can see a 3x3 grid of 9 boxes, one of which is empty (marked as ?). You have to\\nchoose which of the 8 alternative shapes (A-H) should be placed in the empty box in order\\nto complete the pattern that connects the shapes. You must first give your explanation and\\nthen output the answer at the end of your response in the format: “The correct answer is:\\n”.\\n{query image }\\n15', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 14}),\n",
       " Document(page_content='Segmented Prompt (0-shot)\\nIn the first image, you will see a 3x3 grid of 9 boxes. Each row has three images and is to\\nbe read from left to right, with the last image in the third row is missing (marked as ?). Your\\ntask is to infer the correct pattern that should complete each row based on the sequence\\nobserved in the preceding patterns, and finally select the right option (A, B, C, D, E, F) that\\nfits the 3rd row’s last image.\\n{query image }\\nFor your convenience, I provide 15 segmented figures: the question grid is segmented into\\n9 parts, and the answer options are segmented into 6 parts. q0, q1, and q2 are the first row,\\nq3, q4, and q5 are the second row, and q6, q7, and q8 are the third row. A, B, C, D, E, and\\nF are the answer options. Your task is to find which option should be placed in q8.\\nq0:{q0 image }\\nq1:{q1 image }\\nq2:{q2 image }\\nq3:{q3 image }\\nq4:{q4 image }\\nq5:{q5 image }\\nq6:{q6 image }\\nq7:{q7 image }\\nq8:{q8 image }\\nA:{A image }\\nB:{B image }\\nC:{C image }\\nD:{D image }\\nE:{E image }\\nF:{F image }\\nFor each row, analyze the changes and relationships between the images. Consider the\\nnumber of shapes, the types of shapes, their positions, the shading, and any other changes\\nthat occur from one pattern to the next. Once you have identified the rule or sequence that\\napplies to the rows, select the option (A, B, C, D, E, F) that contains the pattern which\\ncorrectly completes the third row sequence.\\nPlease first give your explanation and then write the answer at the end of your response in\\nthe format: “The correct answer is: ”.\\nA.2 Sampling methods for model evaluations\\nIn the main text, we present evaluation results obtained by independently running each model 10\\ntimes and estimating their respective statistics. For GPT4-V , we found that using a zero temperature\\nsetting along with different random seeds effectively balances prediction accuracy with reasoning\\ndiversity. In the case of Gemini Pro Vision, the optimal performance is achieved at a temperature\\nsetting of 0.4. However, for the self-consistency evaluation, we increase the sampling temperature\\nto a maximum of 0.7. This adjustment facilitates more varied predictions which are beneficial for\\nmajority-voting processes. For all other models, we observed that a lower temperature range, be-\\ntween 0and0.2, is necessary. This lower temperature helps the models better adhere to instructions\\nand improves their task prediction accuracy, though it also results in somewhat less variability in the\\nmodels’ predictions.\\n16', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 15}),\n",
       " Document(page_content='M-Easy (Oracle)\\nDescription\\nThe grid:\\n1. Top left box: square with a cross sign.\\n2. Top center box: circle with a star.\\n3. Top right box: Empty triangle.\\n4. Middle left box: empty square.\\n5. Middle center box: circle with a cross sign.\\n6. Middle right box: triangle with a star.\\n7. Bottom left box: square with a star.\\n8. Bottom center box: empty circle.\\n9. Bottom right box: ’?’.\\nThe alternative shapes are:\\n• A. Triangle with a star.\\n• B. Triangle with a plus sign.\\n• C. Circle with a cross sign.\\n• D. Circle with a star.\\n• E. Empty triangle.\\n• F. Triangle with a cross sign.\\nRationale\\nAnalyzing the grid, there seems to be a pattern involving both the shapes and the\\nsymbols inside them:\\n• The symbols inside the shapes alternate between a cross, a star, and then a\\nblank space as you move horizontally across each row.\\n• The shapes themselves also alternate within each row – this is seen with the\\nsquare, circle, and triangle repeating in each row in that order.\\n17', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 16}),\n",
       " Document(page_content='A.3 Oracle description and rationale for each RPMs\\nM-Medium (Oracle)\\nDescription\\nThe grid:\\n1. Top left box: A downward-pointing triangle with three dots in it.\\n2. Top center box: A leftward-pointing triangle with two dots in it.\\n3. Top right box: An upward-pointing triangle with one dot in it.\\n4. Middle left box: A rightward-pointing triangle with two dots in it.\\n5. Middle center box: A downward-pointing triangle with one dot in it.\\n6. Middle right box: A leftward-pointing triangle with three dots in it.\\n7. Bottom left box: An upward-pointing triangle with one dot in it.\\n8. Bottom center box: A rightward-pointing triangle with three dots in it.\\n9. Bottom right box: ’?’.\\nThe alternative shapes are:\\n• A. An upward-pointing triangle with two dots in it.\\n• B. A downward-pointing triangle with one dot in it.\\n• C. A leftward-pointing triangle with one dot in it.\\n• D. A rightward-pointing triangle with two dots in it.\\n• E. A leftward-pointing triangle with two dots in it.\\n• F. A downward-pointing triangle with two dots in it.\\nRationale\\nAnalyzing the grid, it appears that there’s a pattern related to the direction the triangle\\nis pointing and the number of dots within the triangles.\\nFirst, let’s establish the patterns of triangle directions and dots count:\\n• The first row has the triangles pointing downward, to the left, and then up.\\n• The second row has the triangles pointing rightward, downward, and then\\nto the left.\\n• This implies that the direction that the triangle is pointing to is rotating\\nclockwise in each row.\\nNow let’s look at the pattern in the number of dots:\\n• The first row has 3, 2, 1 dots.\\n• The second row has 2, 1, 3 dots.\\n• This implies a pattern of a decreasing sequence from left to right.\\n18', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 17}),\n",
       " Document(page_content='Hard (Oracle)\\nDescription\\nThe grid:\\n1. Top left box: white circle, white triangle, black square.\\n2. Top center box: white triangle, black circle, white square.\\n3. Top right box: black square, black square, white triangle.\\n4. Middle left box: black circle, white square, white triangle.\\n5. Middle center box: black square, white triangle, black square.\\n6. Middle right box: white triangle, black square, white circle.\\n7. Bottom left box: white triangle, black square, black square.\\n8. Bottom center box: black square, white circle, white triangle.\\n9. Bottom right box: ’?’.\\nThe alternative shapes are:\\n• A. white circle, white triangle, black square.\\n• B. black circle, white square, white triangle.\\n• C. white circle, white square, black triangle.\\n• D. white circle, black square, white triangle.\\n• E. black square, white triangle, white circle.\\n• F. white square, white triangle, black circle.\\nRationale\\nAnalyzing the grid, it appears that there’s a pattern along the diagonal lines from top\\nright to bottom left direction.\\n• The three elements in the first diagonal line (top right, middle center, bot-\\ntom left) consist of a black square, a black square, and a white triangle. As\\nwe follow the diagonal line, these elements rotate 120 degrees clockwise.\\n• The three elements in the second diagonal line (top left, middle right, bot-\\ntom center) consist of a white circle, a white triangle, and a black square.\\nAs we follow the diagonal line, these elements rotate 120 degrees clock-\\nwise.\\n• This implies that the third diagonal line (top center, middle left, bottom\\nright) should also follow this pattern.\\nA.4 Rationale generated by GPT-4V\\nIn this paper, we present several examples that demonstrate the reasoning capabilities of GPT-4V\\nfor solving certain RPM tasks. From the examples, it becomes evident that as the difficulty level\\nof the problem increases, the GPT-4V tends to struggle with its reasoning abilities and provides an\\nexplanation that is nonsensical.\\n19', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 18}),\n",
       " Document(page_content='Hard (rationale)\\nAnalyzing the grid, it appears that there’s a pattern along the diagonal lines from top right\\nto bottom left direction. - The three elements in the first diagonal line (top right, middle\\ncenter, bottom left) consist of a black square, a black square, and a white triangle. As we\\nfollow the diagonal line, these elements rotate 120 degrees clockwise. - The three elements\\nin the second diagonal line (top left, middle right, bottom center) consist of a white circle, a\\nwhite triangle and a black square. As we follow the diagonal line, these elements rotate 120\\ndegrees clockwise. - This implies that the third diagonal line (top center, middle left, bottom\\nright) should also follow this pattern.\\nGenerated rationale for M-Medium RPM\\nAnalyzing the grid, it appears there’s a pattern related to the direction the triangle is pointing\\nand the number of dots within the triangles.\\nFirst, let’s establish the patterns of triangle directions and dots count:\\n• The first row has the triangles pointing downward, to the left, and then up.\\n• The second row has the triangles pointing rightward, downward, and then to the\\nleft.\\n• It implies that for the third row, following the same pattern, the triangles should\\npoint upward, rightward, and then downward.\\nNow let’s look at the pattern in the number of dots:\\n• Along the vertical columns, we can observe a pattern in dot counts:\\n–The first column has 3, 2, 1 dots.\\n–The second column has 2, 1, 3 dots.\\n–If we follow this pattern, the third column should have 1, 3, 2 dots from top to\\nbottom.\\nGiven these observed sequences, the third row should have a triangle pointing downward\\nwith two dots in it, fitting the patterns of both direction and dot count. Looking at the\\nalternative shapes, option F is the downward-pointing triangle with two dots within it.\\nAnswer: ”F”\\n20', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 19}),\n",
       " Document(page_content='Generated rationale for M-Hard RPM\\nAnalyzing the grid, we can deduce the following pattern:\\n1. The shapes in each box follow a sequence that changes from one cell to the next\\nhorizontally.\\n2. There are three types of shapes: circle, square, and triangle, and each type also has\\ntwo variants of color: black or white.\\nTo understand how the sequence works, we can lay out the patterns we see:\\n• In the top row, we start with a white circle, white triangle, and black square. Moving\\nright, the shapes rotate positions clockwise, and their colors invert (the white circle\\nbecomes a black circle and so on).\\n• The same pattern follows in the middle row: black circle, white square, white trian-\\ngle becomes black square, white triangle, black square, and again inverting colors\\nand rotating the shapes clockwise.\\n• Following this pattern in the bottom row, white triangle, black square, black square\\nshould change by rotating the shapes clockwise and inverting the colors. The black\\nsquare would become a white square, the black square would become a white circle,\\nand the white triangle would stay the same, remaining in the bottom position.\\nMatching this description with the alternatives given, option ”F” (white square, white trian-\\ngle, black circle) fits the observed sequence.\\nAnswer: ”F”\\n21', metadata={'source': 'pdf\\\\2403.04732.pdf', 'page': 20})]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_community.document_loaders import PyPDFDirectoryLoader\n",
    "\n",
    "loader = PyPDFDirectoryLoader(\"pdf/\")\n",
    "\n",
    "docs = loader.load()\n",
    "\n",
    "docs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09c4e97b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b9de7bff",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores import FAISS\n",
    "from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings\n",
    "embeddings_path = \"D:\\\\ai\\\\download\\\\bge-large-zh-v1.5\"\n",
    "embeddings = HuggingFaceEmbeddings(model_name=embeddings_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "1fbaee57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<langchain_community.vectorstores.faiss.FAISS at 0x1a7bf4c2020>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vectorStoreDB = FAISS.from_documents(docs,embedding=embeddings)\n",
    "vectorStoreDB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "5c6e7dd4",
   "metadata": {},
   "outputs": [],
   "source": [
    "#使用向量数据库生成检索器\n",
    "retriever = vectorStoreDB.as_retriever(\n",
    "    search_type=\"similarity_score_threshold\", search_kwargs={\"score_threshold\": 0.3}\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "e49dc923",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "template =\"\"\"\n",
    "只根据以下文档回答问题：\n",
    "{context}\n",
    "\n",
    "问题：{question}\n",
    "\"\"\"\n",
    "\n",
    "prompt = ChatPromptTemplate.from_template(template)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "dd85ac8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.runnables import RunnableParallel,RunnablePassthrough\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "outputParser = StrOutputParser()\n",
    "\n",
    "setup_and_retrieval = RunnableParallel(\n",
    "    {\n",
    "        \"context\":retriever,\n",
    "        \"question\":RunnablePassthrough()\n",
    "    }\n",
    ")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "d7328a1e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'context': [Document(page_content='The Social Impact of Generative AI A P REPRINT\\naddressing environmental challenges . For instance, it may help find innovative solutions to reduce water consumption\\nin the AI industry, highlighting its role in promoting sustainability [George et al., 2023]. Similarly, ChatGPT is also\\nappreciated for its potential as an informative and accountability tool within institutions [Ray, 2023, Cardoso, 2023,\\nBiswas, 2023a]. Lastly, it positively impacts journalism, where it can help with content creation, fact-checking, and\\ngenerating engaging narratives [Davis, 2022, Marr, 2023, Bahrini et al., 2023].\\nConversely, the most recurrent social concern is bias. Several papers [International Journal of Human Rights Law\\nReview, 2023, Ray, 2023, Paul et al., 2023, Lee, 2023, Shidiq, 2023, Biddle, 2023, C and J, 2023, Equality Now,\\n2023, Robson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023a, Vidhya et al., 2023, Khan and Umer, 2023, Ausat et al.,\\n2023, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Treude and Hata, 2023, Farrell et al., 2022, Tiunova\\nand Muñoz, 2023, Thirunavukarasu et al., 2023, Stepanechko and Kozub, 2023, Geertsema et al., 2023] discuss the\\nrisk of deepening existing biases and how ChatGPT can include sexist and racist views due to the characteristics of\\nthe data used during its training. In this sense, there is a special concern about using these tools in various sectors,\\nincluding finance [Khan and Umer, 2023] and other social activities, which can replicate and deepen structural and\\nhistorical inequalities. Our analysis also reveals another perceived negative impact, which is its potential to generate\\nfalse information and facilitate the spread of disinformation [Ray, 2023, Paul et al., 2023, Hillemann and Zimprich,\\n2023, Davis, 2022, Lock, 2022, Bahrini et al., 2023, Equality Now, 2023, Robson, 2023, Li, 2023, Biswas, 2023a,\\nVidhya et al., 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Wolf, 2023, Vallance,\\n2022, Rozado, 2023-03, Paul et al., 2023, Solaiman et al., 2019, Tamkin et al., 2021, Farrell et al., 2022, Tiunova and\\nMuñoz, 2023, Khowaja et al., 2023]. This misuse of technology directly affects rights related to access to accurate\\ninformation and freedom of expression, as well as democratic stability. This phenomenon is particularly relevant as,\\nalthough remarkable in artificial intelligence, the advancements and improvements in generative models have raised\\nconcerns and uncertainties regarding their impact on democratic processes . There is a special concern about the\\nease with which false information can be generated and disseminated, especially in critical contexts such as elections,\\nreferendums, political instability, war conflicts, or under dictatorial regimes. Furthermore, this potential negative impact\\nincludes the digital public sphere, where spreading hate speech on social networks [Institute for Human Rights and\\nBusiness, 2023] can lead to social fragmentation and bolster the manipulation of democratic institutions and social\\ncontrol. False information can be weaponized across various domains, from the stock market to information warfare\\nand propaganda.\\nIn the same vein, another relevant concern is privacy [International Journal of Human Rights Law Review, 2023, Ray,\\n2023, Paul et al., 2023, Deo, 2023, Lee, 2023, Bahrini et al., 2023, Mhlanga, 2023, C and J, 2023, Biswas, 2023a,\\nRobson, 2023, Perrigo, 2023, Li, 2023, Biswas, 2023b, Vidhya et al., 2023, Khan and Umer, 2023, Helberger and\\nDiakopoulos, 2023, Vallance, 2022, Khlaif, 2023, Paul et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nIskender, 2023]. Different aspects contribute to this concern, including the coverage of these models under AI Act\\nregulations, its extensive use of user data (particularly minors), the potential for surveillance applications, the data\\nvulnerability to cyber attacks targeting ChatGPT, and the privacy implications when integrating the model into various\\ndomains such as education and the military. Privacy is a growing concern as illustrated by the case of the Italian state\\nban. In this case, the Italian authorities asked OpenAI to “expand its privacy policy for users and made it also accessible\\nfrom the sign-up page prior to registration with the service” [Garante per la protezione dei dati personali, 2023] in order\\nto operate in Italy. This case highlights the urgency for ensuring responsible and transparent use of the technology.\\nAnother important negative impact is the risk of job loss , as automation and AI capabilities advance [Davis, 2022, Deo,\\n2023, Rivas and Zhao, 2023, Lee and Yilmaz Soylu, 2023, Biddle, 2023, Robson, 2023, Khan and Umer, 2023, Air\\net al., 2023, Wolf, 2023, Curtis and ChatGPT§, 2023, Gabbiadini et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023].\\nFurthermore, other papers show apprehension about over-regulation [Helberger and Diakopoulos, 2023], highlighting\\nthe needed balance between ensuring ethical and responsible use of AI technologies while avoiding stifling innovation.\\nAdditionally, the model’s own cybersecurity is listed as a negative impact [Levente, 2023, Bahrini et al., 2023, C\\nand J, 2023, Khan and Umer, 2023, Helberger and Diakopoulos, 2023, Air et al., 2023, Khowaja et al., 2023].\\nThe comprehensive list of negative perceptions on ChatGPT impacts can be found in Table A1 in online appendix\\n[Baldassarre et al., 2023], providing further insights into the concerns found in the literature.\\nRQ2. What are the emerging trends perceived in ChatGPT development?\\nAn essential part of our review shows emerging trends [Hillemann and Zimprich, 2023, Telegraph, 2023, DiBenedetto,\\n2023, Deo, 2023, Equality Now, 2023, Institute for Human Rights and Business, 2023, Kumordzie, 2023, Vallance,\\n2022, Perrigo, 2023, Al Ashry, 2023, Bareis, 2023, Curtis and ChatGPT§, 2023, Rutinowski et al., 2023, Lee and\\nYilmaz Soylu, 2023, Rozado, 2023-03, George et al., 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas, 2023b,\\nVidhya et al., 2023, Suguri Motoki et al., 2023, Rivas and Zhao, 2023, Khlaif, 2023, Bahrini et al., 2023, Ray, 2023, Khan\\nand Umer, 2023, Ausat et al., 2023, Paul et al., 2023, Tamkin et al., 2021, Solaiman et al., 2019, Lee and Yilmaz Soylu,\\n10', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 9}),\n",
       "  Document(page_content='The Social Impact of Generative AI A P REPRINT\\n2023, Geertsema et al., 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023, Bruff, 2023]. Within this category we found uncertainty surrounding copyright , which is a prominent trend,\\nraising doubts about the possibility of profiting from chat-generated content and whether such content is subject to\\ncopyright protection. Resolving these issues requires legislative intervention, leading to a particular and complex debate\\namong authorities regarding the legal implications of AI-generated content [Hillemann and Zimprich, 2023]. There is\\nalso a growing demand for the enhancement of regulatory frameworks to safeguard original content, considering that\\nmodels like ChatGPT have the potential to negatively impact the work of scientists, writers, researchers, and artists\\n[Al Ashry, 2023, Ray, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023, Iskender, 2023, Stepanechko and Kozub,\\n2023]. Notably, analyses such as by Khowaja et al. [Khowaja et al., 2023] raise crucial questions regarding ownership\\nrights over the data used to train the model and the ownership of the model itself. As mentioned earlier, there is a\\ndominant call for establishing ethical use guidelines , both at a general societal level [Gabbiadini et al., 2023] and\\nspecifically within educational and research institutions [Mhlanga, 2023, Khlaif, 2023, Ausat et al., 2023, Solaiman\\net al., 2019, Tamkin et al., 2021, Lee and Yilmaz Soylu, 2023, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023,\\nIskender, 2023, Stepanechko and Kozub, 2023, Bruff, 2023]. These guidelines will be pivotal in ensuring responsible\\nadoption models such as ChatGPT.\\nAnother emerging trend is developing transparency mechanisms [Equality Now, 2023, Lee and Yilmaz Soylu, 2023,\\nLee, 2023, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Tiunova and Muñoz, 2023, Khowaja et al., 2023,\\nStepanechko and Kozub, 2023]. Transparency is considered a vital strategy to address resistance toward adopting\\nthese models in various contexts [Bahrini et al., 2023] and to mitigate potential AI stigmatization. However, it is also\\nacknowledged that transparency poses challenges that need to be overcome [Khowaja et al., 2023].\\nAnother crucial issue to highlight is the accountability for potentially harmful uses of such technology [Deo, 2023,\\nBiswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and Kozub,\\n2023, Helberger and Diakopoulos, 2023], which involves both the end-users and the companies responsible for its\\ndevelopment [Helberger and Diakopoulos, 2023, Deo, 2023, Institute for Human Rights and Business, 2023, George\\net al., 2023, Biswas, 2023b, Vidhya et al., 2023, Ray, 2023, Paul et al., 2023, Khowaja et al., 2023, Stepanechko and\\nKozub, 2023]. This concern extends to applications in sensitive domains like the military [Deo, 2023]. Furthermore,\\nother several notable trends emerge, including the need for timely and appropriate regulation [Gabbiadini et al.,\\n2023, Bruff, 2023], the existence of political bias (29, 40, 42, 62), transformations within the AI market [Deo,\\n2023, Equality Now, 2023, Kumordzie, 2023, Perrigo, 2023, Tamkin et al., 2021, Farrell et al., 2022, Geertsema et al.,\\n2023], and the integration of renewable technologies and environmental awareness within this field [C and J, 2023,\\nInternational Journal of Human Rights Law Review, 2023].\\nRQ3. Which areas of improvement can be identified in the development of such technologies?\\nThe findings concerning areas for improvement within the field of generative AI present a diverse range of perspectives\\n[Hillemann and Zimprich, 2023, Helberger and Diakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023,\\nInstitute for Human Rights and Business, 2023, WeeTech Solution, 2023, Levente, 2023, Air et al., 2023, Kumordzie,\\n2023, Wolf, 2023, Vallance, 2022, Perrigo, 2023, Al Ashry, 2023, Levente, 2023, Sun and Hoelscher, 2023, Biswas,\\n2023a, Zhuo et al., 2023, Abdullah et al., 2022, Bahrini et al., 2023, Ray, 2023, Khan and Umer, 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Solaiman et al., 2019, Tamkin et al., 2021, C and J, 2023, Tiunova and Muñoz,\\n2023, Khowaja et al., 2023, Gabbiadini et al., 2023, Gupta et al., 2023, Iskender, 2023, Stepanechko and Kozub, 2023,\\nBruff, 2023]. One prominent area is the examination of regulations [Hillemann and Zimprich, 2023, Helberger and\\nDiakopoulos, 2023, Deo, 2023, Equality Now, 2023, Biddle, 2023, Institute for Human Rights and Business, 2023,\\nKumordzie, 2023, Wolf, 2023, Al Ashry, 2023, George et al., 2023, Levente, 2023, Lee and Yilmaz Soylu, 2023, Ray,\\n2023, Solaiman et al., 2019, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Gabbiadini et al., 2023, Bruff, 2023],\\nparticularly regarding whether the risk-based approach outlined in the AI Act effectively covers generative models\\n[Helberger and Diakopoulos, 2023, Wolf, 2023]. It is suggested that comprehensive guidelines should encompass the\\nentire spectrum, from its application to the AI Research and Development (R&D) [George et al., 2023]. Furthermore,\\nthere is a growing advocacy for a people-centred vision [Wolf, 2023, Kumordzie, 2023, Ray, 2023] that emphasizes\\nthe importance of human rights and ethical considerations in designing and implementing generative AI systems.\\nSpecifically, Solaiman et al. [Solaiman et al., 2019] examine the need to \"build frameworks for navigating trade-offs\"\\nand develop decision-making frameworks that account for the complexities and potential trade-offs associated with\\ngenerative AI. Similarly, Li [Levente, 2023] highlights the necessity to transform the European regulatory paradigm to\\neffectively address the challenges posed by LLMs.\\nAnother area of opportunity lies in addressing technical limitations within generative AI models [Levente, 2023, Curtis\\nand ChatGPT§, 2023, Sun and Hoelscher, 2023, Abdullah et al., 2022, Bahrini et al., 2023, Cao et al., 2023, Paul et al.,\\n2023, Sobieszek and Price, 2022, Tamkin et al., 2021, Tiunova and Muñoz, 2023, Thirunavukarasu et al., 2023, Gupta\\net al., 2023, Iskender, 2023]. For instance, a significant challenge is the presence of fictional references [Tiunova\\n11', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 10}),\n",
       "  Document(page_content='The Social Impact of Generative AI A P REPRINT\\nFigure 4: Sankey diagram illustrating the distribution of code groups across document groups\\n6 Discussion\\nRQ1. What are the perceived positive and negative impacts of ChatGPT in contemporary society?\\nThroughout our literature review, we identified several positive and negative impacts attributed to ChatGPT. Noteworthy\\nbenefits include: the potential for enhancing customer service; multiple papers emphasize the positive impact of\\nChatGPT in this domain [International Journal of Human Rights Law Review, 2023, Ray, 2023, Paul et al., 2023,\\nHillemann and Zimprich, 2023, Davis, 2022, Lock, 2022, Deo, 2023, Rivas and Zhao, 2023, Kumordzie, 2023, Levente,\\n2023, Marr, 2023, Abdullah et al., 2022, Khowaja et al., 2023, Gupta et al., 2023, Bruff, 2023, Iskender, 2023]. The\\nmodel is highlighted as an enabler of cross-cultural dialogue, facilitating communication between individuals from\\ndifferent cultural backgrounds [International Journal of Human Rights Law Review, 2023]. Moreover, ChatGPT offers\\nthe advantage of automating repetitive tasks , freeing time for more complex and value-added activities [Davis, 2022,\\nLevente, 2023, Khan and Umer, 2023, Gupta et al., 2023]. These benefits extend to various sectors, including business\\nand healthcare [Deo, 2023]. Another key advantage is its availability around the clock . This 24/7 accessibility proves\\nvaluable in commercial, healthcare, and educational contexts [Paul et al., 2023, Levente, 2023, Lee, 2023, Sun and\\nHoelscher, 2023, Cardoso, 2023]. The model’s continuous availability ensures timely assistance and support, covering\\nusers’ diverse needs.\\nChatGPT also demonstrates significant potential in education , offering various advantages [Marr, 2023, Shidiq, 2023,\\nLee and Yilmaz Soylu, 2023, Sun and Hoelscher, 2023, Mhlanga, 2023, Rivas and Zhao, 2023, Abdullah et al., 2022,\\nBahrini et al., 2023, Ray, 2023, Ausat et al., 2023, Solaiman et al., 2019, Lee and Yilmaz Soylu, 2023, Tiunova and\\nMuñoz, 2023, Iskender, 2023, Bruff, 2023, Geertsema et al., 2023]. Despite concerns about plagiarism and academic\\nintegrity, the model’s integration can enhance teaching practices in several ways. It can, for example, automates\\ncurriculum creation, enabling educators to save time and streamline the process [Marr, 2023, Bahrini et al., 2023].\\nMoreover, it facilitates the development of innovative educational content, fostering an engaging learning environment\\n[Lee, 2023, Shidiq, 2023]. Additionally, the model serves as a personalized study support assistant, providing tailored\\nguidance and assistance to individual learners [Ray, 2023, Rivas and Zhao, 2023, Sun and Hoelscher, 2023, Mhlanga,\\n2023, Abdullah et al., 2022, Ausat et al., 2023]. In this perspective, a vision emphasizes the need for controlled\\nintegration and adherence to academic guidelines to ensure responsible and ethical use of generative AI models in\\neducation [Sun and Hoelscher, 2023]. By establishing appropriate regulations and ethical frameworks, the educational\\nbenefits of ChatGPT can be maximized while addressing concerns related to plagiarism and promoting an enriching\\nlearning experience.\\nIn the medical field, the model has shown promise in research, data analysis, and telemedicine applications, contributing\\nto advancements in healthcare [Bahrini et al., 2023]. Furthermore, diverse papers recognize its potential contribution to\\n9', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 8}),\n",
       "  Document(page_content='The Social Impact of Generative AI A P REPRINT\\nThese perspectives, as human behavior changes over time, interact and mutually influence and foster one another. In the\\ncase of ChatGPT, we also observe these attitudes, including optimism, pessimism, and skepticism, but the panorama is\\neven more complex as we present below in our literature review.\\nAs evidence of the increasing interest in ChatGPT and AI, Figure 1 depicts the fluctuations in Google search queries for\\n\"ChatGPT\" globally from May 2020 (the release of ChatGPT) until May 2023 (which includes the time frame of our\\nresearch).\\nFigure 1: Search queries evolution on “ChatGPT” from May 2020 to May 2023 via Google Trends ( https://trends.\\ngoogle.it/trends/explore?date=2020-04-28%202023-05-12&q=Chat%20GPT3&hl=en ). The \"note\" in the\\ngraph reflects an improvement in Google’s data collection system implemented on 1 January 22.\\nFig.1, furthermore, depicts a consistent low level of interest until December 2022, when a significant shift occurred,\\ncoinciding with the months following the launch of Open AI’s ChatGPT as a prototype service on 30 November 2022\\n[Gordon, 2022], which attracted global attention. In April and May of 2023, interest peaked.\\nAs we enter the third wave of AI evolution, examining how socialization processes change and assessing scientific\\ninnovations’ potential positive or negative effects on rights and freedoms is critical. Following Pinch and Bijker [Pinch\\nand Bijker, 1984], it is essential to examine the construction of scientific knowledge across different localities and\\ncontexts. As a result, the primary goal of this analysis is to assess the evolution of ChatGPT in recent years and to\\nexamine its current perceived impact on various social aspects, in the context of the ongoing wave of AI evolution.\\n3 STATE OF THE ART\\nGenerative pre-training (GP) was a well-known concept in machine learning applications since 2012 [noa, 2012].\\nLater, in 2017 Google introduced the transformer architecture [Vaswani et al., 2017]. These advancements led to the\\nbirth of large language models like BERT in 2018 [Devlin et al., 2019] and XLNet in 2019 [Yang et al., 2019]: these are\\npre-trained transformers (PT) but are not designed to be generative. A language model is a probability distribution over\\nsequences of words [Jurafsky and Martin, 2008]: given any sequence of words of length m, a language model assigns a\\nprobability to the whole sequence. A Large Language Model (LLM) is a language model based on a neural network\\nwith many parameters (typically billions or more). Prior to transformer-based architectures, the best-performing neural\\nNatural Language Processing (NLP) models employed supervised learning from large amounts of manually-labeled\\ndata.\\nThe main drawbacks of using supervised learning are the impossibility to use it on not well-annotated datasets, and\\nalso the prohibitive cost and time required to train extremely large language models [Radford and Narasimhan, 2018].\\nUsually, LLMs trained on a large quantity of data can perform discretely a good number of tasks; anyway, they can be\\nfine-tuned (i.e., further trained on specific data) to execute a specific task with better performance.\\nLater, in 2018 OpenAI [OpenAI, 2018] published its famous article \" Improving Language Understanding by Generative\\nPre-Training \", in which the first Generative Pre-trained Transformer (GPT) system was introduced [Radford and\\nNarasimhan, 2018]. GPT is a type of large language model (LLM) used mainly for Generative AI, which is a type of AI\\ncapable of generating various kinds of content, such as text and images, in response to instructions (also known as\\nprompts). Generative AI models learn the patterns and structure of their input training data and then generate new data\\nthat has similar characteristics, according to what has been asked as a prompt.\\n3', metadata={'source': 'pdf\\\\2403.04667.pdf', 'page': 2})],\n",
       " 'question': 'ChatGPT在法律层面有哪些影响？'}"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result = setup_and_retrieval.invoke(\"ChatGPT在法律层面有哪些影响？\")\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "e0c3dadc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ChatGPT对法律领域的影响是多方面的，包括潜在的积极和消极方面。以下是一些关键点：\\n\\n1. **法律咨询和教育**：ChatGPT可以提供有关基本法律问题的信息和建议，并帮助人们更好地了解他们的权利和义务。然而，它不能替代专业律师的建议，因为它的建议可能不适用于特定情况或地区。此外，用户可能会错误地依赖其建议，导致潜在的法律问题。\\n2. **合同起草和审查**：ChatGPT可以协助起草简单的合同，但无法处理复杂的法律文件。此外，它提供的合同可能没有考虑到所有相关因素，因此需要专业律师的审查。\\n3. **案件研究**：ChatGPT可以帮助律师进行初步的案例研究，但它不能替代对法律文献的深入分析。它的建议可能不准确或过时，因此需要专业人士的验证。\\n4. **监管和合规性**：ChatGPT可以协助企业了解和遵守相关法规，但无法处理复杂的合规问题。它提供的信息可能不完整或不准确，因此需要专业顾问的审查。\\n5. **数据隐私和保护**：ChatGPT的使用可能会引发数据隐私问题，因为它需要收集和使用大量的个人数据来提供服务。如果这些数据没有得到妥善的保护和管理，可能会导致严重的法律后果。\\n6. **知识产权**：ChatGPT生成的文本可能存在版权问题，因为它们是基于训练数据集中的现有文本生成的。这可能导致潜在的法律纠纷。'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = setup_and_retrieval | prompt | model | outputParser\n",
    "chain.invoke(\"ChatGPT在法律层面有哪些影响？\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dcd1d958",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30564aaa",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
